Good to know; I’ll record this as positive news ;) Feel free to give
me an update once you encounter a similar behavior.


On Mon, May 14, 2018 at 8:40 PM, Eliot Kimber <[email protected]> wrote:
> Hmm.
>
> In the process of testing my test data set I can't reproduce the earlier 
> behavior.
>
> In my current tests, using the same data and the same BaseX version, I get a 
> maximum of maybe 1GB for the largest file but just a few hundred MBs once 
> everything is loaded.
>
> For 3800 topics of roughly 50K each (on average) it takes just a couple of 
> seconds to load them with no DTDs, a minute or so with DTDs, which is 
> consistent with the time cost of reparsing the (large) DITA grammars for each 
> topic.
>
> So not sure what was happening when I tried this before but I definitely 
> rebooted and installed macOS updates since then, so could have been some Java 
> issue or who knows what else.
>
> The good news is that even without grammar caching the DITA topics do load in 
> a reasonable (if not ideal) amount of time and with appropriate memory usage.
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 5/14/18, 12:53 PM, "Eliot Kimber" 
> <[email protected] on behalf of 
> [email protected]> wrote:
>
>     Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a 
> test data set for you.
>
>     Cheers,
>
>     E.
>
>     --
>     Eliot Kimber
>     http://contrext.com
>
>
>     On 5/14/18, 12:45 PM, "Christian Grün" <[email protected]> wrote:
>
>         I would have expected some MBs to be sufficient for parsing even
>         complex DTDs if nothing is cached (but caching could definitely speed
>         up processing), so maybe there’s still something that we could have a
>         look at. If you are interested, feel free to provide me with your
>         files via a private message.
>
>
>
>         On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber <[email protected]> 
> wrote:
>         > Yes, I would want caching on by default with the option to turn it 
> off. I'm assuming it's currently not turned on (but to be honest I haven't 
> taken the time to check the source code).
>         >
>         > Certainly for DITA content grammar caching is the only practical 
> way to parse a large number of topics in the same JVM without both using lots 
> of memory and eating an avoidable processing cost of re-processing the 
> grammar files again for each document.
>         >
>         > DITA is probably somewhat unique in this regard because it takes a 
> such a different approach to grammar organization and use than pretty much 
> any other XML application.
>         >
>         > Cheers,
>         >
>         > E.
>
>
>
>
>
>
>
>

Reply via email to