Hi Ludovic,

>  we do need (unfortunately) the resource index during the initial load
> because there's an external component that perform some calculations
> based on what digital objects are created per FOXML ingest.

If that is the case, can we assume you are either flushing the triple
store when necessary, or setting syncUpdates = true in the
configuration?  

I was involved with a project 4-5 years ago that was having lots of
trouble with large ingests with this kind of configuration, using the
Kowari triple store.  The only solution at the time (besides
re-architecting their software) was to use a different triplestore -
hence the development of MPTStore.

If I remember correctly, Mulgara eventually reached the point where the
original problems (performance and stability) had been fixed or reduced,
but I've never recently attempted a large ingest (> 5 million objects)
in that configuration.  Perhaps there is a regression somewhere?

>         since we create thousands of digital objects
>         > during consecutive hours (with of course creation of content
>         datastreams and
>         > setting relationships between digital objects)

>          this lead to the creation
>         > within the JVM of many objects by Mulgara (this is our
>         understanding).

This should not cause that many objects to be created in the heap.
After commit, most of the data will have been written to the index and
string pool files, so I would expect the number of objects in memory to
be relatively constant or possibly logarithmically increasing - unless
there is a bug.

>         > error: *java.lang.OutOfMemoryError:
>         > GC overhead limit exceeded*

Try specifying a different garbage collector, and see how it behaves.
>From my understanding, this is due to an unfavorable ratio of time spent
garbage collecting to free space recovered.


>         > After some research and tuning, the only option we see right
>         now is to make
>         > some pauses during the initial load process to let the JVM
>         takes times to
>         > execute normal GC operations.

Does this work?

  -Aaron





------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to