Hi,

I want to use the Content Enhancement component of Stanbol with a custom 
vocabulary which contains about 1.5 million triples. I followed the 
instructions for creating a local index using the Entityhub Indexing Tool and 
everything worked as expected. However, as those 1.5 million triples are only 
the initial import and after that, the vocabulary should be managed via the 
REST API, I would prefer to have a Managed Site or use the Entityhub itself for 
storing my RDF data. I tried importing my triples (which are distributed over 
15 .nt files) via the /entityhub/entity?update=true endpoint, however I ran 
into problems, most likely because of the size of the import. The Java 
application which send the REST calls to the Stanbol API returns a 
"java.net.SocketException: Unexpected end of file from server" for each file 
and even after my program finished, Stanbol is processing the submitted data 
for hours. The error log states repeatedly "PERFORMANCE WARNING: Overlapping 
onDeckSearchers=2".

What would you suggest is the best approach for importing such a large amount 
of triples into the Entityhub?

And furthermore, could you please explain, if there's any difference between 
using a Managed Site and the Entityhub itself? From what I understood from the 
documentation, the only advantage of a Managed Site is the fact that it can be 
used to separate multiple vocabularies from each other. Is there any other 
difference?

Any help is much appreciated!

Best regards,
Marvin Luchs

Reply via email to