Good to know; I’ll record this as positive news ;) Feel free to give me an update once you encounter a similar behavior.
On Mon, May 14, 2018 at 8:40 PM, Eliot Kimber <[email protected]> wrote: > Hmm. > > In the process of testing my test data set I can't reproduce the earlier > behavior. > > In my current tests, using the same data and the same BaseX version, I get a > maximum of maybe 1GB for the largest file but just a few hundred MBs once > everything is loaded. > > For 3800 topics of roughly 50K each (on average) it takes just a couple of > seconds to load them with no DTDs, a minute or so with DTDs, which is > consistent with the time cost of reparsing the (large) DITA grammars for each > topic. > > So not sure what was happening when I tried this before but I definitely > rebooted and installed macOS updates since then, so could have been some Java > issue or who knows what else. > > The good news is that even without grammar caching the DITA topics do load in > a reasonable (if not ideal) amount of time and with appropriate memory usage. > > Cheers, > > E. > > -- > Eliot Kimber > http://contrext.com > > > On 5/14/18, 12:53 PM, "Eliot Kimber" > <[email protected] on behalf of > [email protected]> wrote: > > Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a > test data set for you. > > Cheers, > > E. > > -- > Eliot Kimber > http://contrext.com > > > On 5/14/18, 12:45 PM, "Christian Grün" <[email protected]> wrote: > > I would have expected some MBs to be sufficient for parsing even > complex DTDs if nothing is cached (but caching could definitely speed > up processing), so maybe there’s still something that we could have a > look at. If you are interested, feel free to provide me with your > files via a private message. > > > > On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber <[email protected]> > wrote: > > Yes, I would want caching on by default with the option to turn it > off. I'm assuming it's currently not turned on (but to be honest I haven't > taken the time to check the source code). > > > > Certainly for DITA content grammar caching is the only practical > way to parse a large number of topics in the same JVM without both using lots > of memory and eating an avoidable processing cost of re-processing the > grammar files again for each document. > > > > DITA is probably somewhat unique in this regard because it takes a > such a different approach to grammar organization and use than pretty much > any other XML application. > > > > Cheers, > > > > E. > > > > > > > >

