I have a new idea for clarifying the website duplication.

Antora lets you include the same page in several components by using “include 
stubs”, i.e. a page that only has an include:: preprocessor directive, pointing 
to the single copy. (includes work in plain asciidoc, but there’s no practical 
way to refer to the originating page without Antora page coordinates).

Intellij tools such as IDEA and WebStorm have a nice diff fe total.ature and 
customizable live templates.  I made a live template to insert the include:: to 
the “actual copy” of the page.

My workflow is to compare a page in 8.0@tomee (i.e. my antora clone of master 
of the tomee git repo) with the same page in the tomes-site antora branch using 
WebStorm’s side-by-side diff.  I combine the asciidoc to get something that is 
more correct than either source, edit any other errors I see, put the corrected 
version in tomee-site, and replace the tomee master copy with an include stub.

One of the edits is including the jbake attributes from master.

This reduces the number of duplicates by one, and indicates:
- on master that the page is duplicated in common
- on tomee-site that the page is duplicated in master, and edited to look OK.

I now have only 2 errors in 8.0@tomee, 367 in common, and 386 total.

David Jencks

> On Feb 17, 2020, at 4:39 PM, David Jencks <david.a.jen...@gmail.com> wrote:
> 
> I pushed another preview update.  This has all the content I’ve found and 
> except for a few pages from svn that seem to originate as html and some pages 
> currently generated by the tomee-stie-generator  I believe it has all the 
> content that is supposed to be published.
> 
> There are about 450 errors noted by Antora building this.  These are asciidoc 
> syntax errors (mostly section level errors) and broken links.
> 
> AFAICT most of the content is duplicated in 4 places: what I’m calling 
> “common”, 8.0, 7.1, and 7.0.  Fixing only one copy would be considerably 
> quicker than comparing and fixing 4.
> 
> My extremely biased opinion is that the content is approximately as good 
> shape as the live site.  There is decidedly less organization and navigation. 
>  Some pages are worse, and some are better, but it’s all presented uniformly.
> 
> TomEE Documentation AWS 
> <https://tomee-preview.s3-us-west-2.amazonaws.com/index.html>
> 
> Thanks
> David Jencks
> 
>> On Feb 16, 2020, at 7:41 PM, David Jencks <david.a.jen...@gmail.com 
>> <mailto:david.a.jen...@gmail.com>> wrote:
>> 
>> Thanks for the more info, and I have plenty to do so no hurry, if you do 
>> have some time some quick questions…
>> 
>> is https://svn.apache.org/repos/asf/tomee/site/trunk/content 
>> <https://svn.apache.org/repos/asf/tomee/site/trunk/content> the “same as” 
>> the website content, so modulo .md and mdtext > html conversion I can look 
>> at it and act like that’s the website to mimic? (also module files that have 
>> no source)
>> 
>> Is all the source from tomee-site, tomee-site-generator (including 
>> java-generated content), and tomee?  So anything not present in some format 
>> from one of these that’s in the above svn repo can be left out?
>> 
>> I’d think there must be some configuration for the Builder 
>> tomee-site-staging but I can’t find it.
>> 
>> IMO the site in any form is in such a mess that it’s hard to know where to 
>> work first.  There are tons of broken formatting, both in the existing site 
>> and the .adoc translations, and loads of broken links that have no plausible 
>> target.  For instance one of the pages you note  has 
>> 
>> \{include:OPENEJBx30:Singleton Beans}
>> 
>> I didn’t touch that one :-) It’s a bit confusing because there’s no 
>> OPENEJBx30 anywhere in sight, but I expect it’s supposed to refer to the 
>> same component/version.  When I get to it it will turn into a redirect.
>> 
>> I’m mostly concentrating on finding all the sources, getting them into some 
>> sort of semi-coherent structure, and fixing formatting and links that 
>> asciidoctor complains about.  I thought I was nearly done, but now there are 
>> a lot more files :-)
>> 
>> After the automatically recognized problems are fixed, examining the pages 
>> for other problems can start.
>> 
>> Thanks for the hints!
>> 
>> David Jencks
>> 
>> 
>> 
>>> On Feb 16, 2020, at 6:58 PM, David Blevins <david.blev...@gmail.com 
>>> <mailto:david.blev...@gmail.com>> wrote:
>>> 
>>> Hi David,
>>> 
>>> Looks like you got there in the end.  I attempted to give a heads up on 
>>> that in my first email about the Apache CMS, but all this is complicated.  
>>> Read this then go back and read my first email on the "Documentation Site" 
>>> thread and hopefully it makes more sense.
>>> 
>>> It's very hard to describe it with magically the right level of detail.  
>>> Here's the way too short version.
>>> 
>>> - https://github.com/apache/tomee-site-generator 
>>> <https://github.com/apache/tomee-site-generator> spits html into here
>>> - https://svn.apache.org/repos/asf/tomee/site/trunk/content/ 
>>> <https://svn.apache.org/repos/asf/tomee/site/trunk/content/> which triggers 
>>> this Apache CMS job
>>> - https://ci.apache.org/builders/tomee-site-staging 
>>> <https://ci.apache.org/builders/tomee-site-staging> which takes any html 
>>> and also converts the mdtext and puts them here
>>> - /usr/local/websites/tomee/trunk which is a private svn repo that 
>>> publishes to here
>>> - http://tomee.staging.apache.org <http://tomee.staging.apache.org/> which 
>>> can only get published if a human visits here
>>> - https://cms.apache.org/tomee/publish 
>>> <https://cms.apache.org/tomee/publish> and clicks the button so all html 
>>> (CMS and otherwise) get published here
>>> - http://tomee.apache.org/ <http://tomee.apache.org/>
>>> 
>>> What we truly need more than a switch from Jbake to Antora is to get rid of 
>>> the Apache CMS as that would cut out 4 of those 7 bullets leaving us with:
>>> 
>>> - https://github.com/apache/tomee-site-generator 
>>> <https://github.com/apache/tomee-site-generator> spits html into here
>>> - https://github.com/apache/tomee-<some-new-repo> 
>>> <https://github.com/apache/tomee-%3Csome-new-repo%3E> which causes Apache's 
>>> new infra to publish here
>>> - http://tomee.apache.org/ <http://tomee.apache.org/>
>>> 
>>> Unfortunately this repo appears to be a honeypot (dead end that can only 
>>> confuse).  It used to be a mirror of 
>>> svn.apache.org/repos/asf/tomee/site/trunk/content 
>>> <http://svn.apache.org/repos/asf/tomee/site/trunk/content>, but it looks 
>>> like the sync stopped about 2 years ago.
>>> 
>>> 
>>> All the work of the last 10 days or so has been on replacing the Javadoc 
>>> and Jbake parts, which have room for improvement and Antora can definitely 
>>> be part of that improvement, but the big win is ditching the CMS, which 
>>> technically could happen now.  
>>> 
>>> Replacing the CMS probably needs its own email.
>>> 
>>> I mentioned this in the first email as well; the issue you pointed out at 
>>> the start with the badly formatted page actually wasn't an issue with 
>>> Jbake.  It was one of many legacy CMS files that wasn't fully converted out 
>>> of the specialized Markdown format.  These issues exist in your Antora 
>>> prototype as well:
>>> 
>>> - 
>>> https://tomee-preview.s3-us-west-2.amazonaws.com/tomee/8.0/singleton-ejb.html
>>>  
>>> <https://tomee-preview.s3-us-west-2.amazonaws.com/tomee/8.0/singleton-ejb.html>
>>> 
>>> And truthfully our content issues date back to our Confluence-based website 
>>> days as there's still a very small bit of Confluence wiki markup hanging 
>>> around:
>>> 
>>> - 
>>> https://tomee-preview.s3-us-west-2.amazonaws.com/tomee/8.0/local-server.html
>>>  
>>> <https://tomee-preview.s3-us-west-2.amazonaws.com/tomee/8.0/local-server.html>
>>> 
>>> Every time we make website-tech switch, the content takes a hit and 
>>> yesterday's content issues get rolled into the newly-created content issues.
>>> 
>>> Though I see our content issues as unrelated to Jbake or Antora and are 
>>> just plain content issues fixable with either solution, I still support 
>>> some Antora usage.  I do think we need to scale back what we're aiming at 
>>> with Antora, however.
>>> 
>>> I'll try to post on that, but it takes me hours to write emails and I still 
>>> have a board report to do and I'm traveling in the morning, so no promises 
>>> :)
>>> 
>>> 
>>> -- 
>>> David Blevins
>>> http://twitter.com/dblevins <http://twitter.com/dblevins>
>>> http://www.tomitribe.com <http://www.tomitribe.com/>
>>> 
>>>> On Feb 16, 2020, at 3:56 PM, David Jencks <david.a.jen...@gmail.com 
>>>> <mailto:david.a.jen...@gmail.com>> wrote:
>>>> 
>>>> I’ve discovered that the svn repo automatically converts .mdtext files to 
>>>> .html, so my conclusions about how much of tomee-site are currently 
>>>> published are wrong.  I’ll redo my calculations.
>>>> 
>>>> Thanks
>>>> David Jencks
>>>> 
>>>>> On Feb 16, 2020, at 8:40 AM, David Jencks <david.a.jen...@gmail.com 
>>>>> <mailto:david.a.jen...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I’d like some verification that my conclusions about content are 
>>>>> reasonable….
>>>>> 
>>>>> The content sources I know about are:
>>>>> 
>>>>> tomee (7-8 docs and examples)
>>>>> 
>>>>> tomee-site-generator (older content)
>>>>> 
>>>>> tomee-site (?)
>>>>> 
>>>>> My understanding is the site is published using svnpubsub, so that svn 
>>>>> repo reflects what is actually visible on the site.
>>>>> 
>>>>> After doing some set arithmetic I’ve discovered that there is no content 
>>>>> on the current site necessarily from tomee-site; there’s a lot of overlap 
>>>>> in content between tomee-site and tomee-site-generator, but nothing from 
>>>>> tomee-site that is missing from tomee-site-generator is on the website.
>>>>> 
>>>>> Is this reasonable?
>>>>> 
>>>>> Is there anything from tomee-site not currently published that _should_ 
>>>>> be added to the site?  According to my calculations, there are about 445 
>>>>> pages in tomee-site that aren’t currently published. 
>>>>> 
>>>>> If anyone wants to study the situation, I recommend looking at my git 
>>>>> repos where all the content I’ve found is similarly organized, and the 
>>>>> summary in 
>>>>> comparison.json 
>>>>> <https://github.com/djencks/tomee/blob/antora/docs/comparison.json 
>>>>> <https://github.com/djencks/tomee/blob/antora/docs/comparison.json>>
>>>>> 
>>>>> calculated using old-new-compare.js 
>>>>> <https://github.com/djencks/tomee/blob/antora/docs/old-new-compare.js 
>>>>> <https://github.com/djencks/tomee/blob/antora/docs/old-new-compare.js>>
>>>>> 
>>>>> There’s also quite a bit of content with no source; as I’ve mentioned 
>>>>> before I think this is a never-cleaned-up leftover from a previous 
>>>>> version of the site.
>>>>> 
>>>>> Thanks
>>>>> David Jencks
>>>>> 
>>>>> ps. by “necessarily” I mean all the pages in svn that could have come 
>>>>> from tomee-site, could also have come from tomee-site-generator.  I don’t 
>>>>> really know where they actually came from, although I could probably 
>>>>> calculate it.
>>>>> 
>>>>> pps. For nitpickers: there may appear to be two unique files at 
>>>>> tomee-site.  One, security/index, is the same as security/security; I’ve 
>>>>> provided a redirect.  The other, documentation, is some sort of site 
>>>>> index or navigation page, possibly generated.  I heavily edited the 
>>>>> version in my repo before realizing it was not needed as-is.
>> 
> 

Reply via email to