Hi, I want to use the Content Enhancement component of Stanbol with a custom vocabulary which contains about 1.5 million triples. I followed the instructions for creating a local index using the Entityhub Indexing Tool and everything worked as expected. However, as those 1.5 million triples are only the initial import and after that, the vocabulary should be managed via the REST API, I would prefer to have a Managed Site or use the Entityhub itself for storing my RDF data. I tried importing my triples (which are distributed over 15 .nt files) via the /entityhub/entity?update=true endpoint, however I ran into problems, most likely because of the size of the import. The Java application which send the REST calls to the Stanbol API returns a "java.net.SocketException: Unexpected end of file from server" for each file and even after my program finished, Stanbol is processing the submitted data for hours. The error log states repeatedly "PERFORMANCE WARNING: Overlapping onDeckSearchers=2".
What would you suggest is the best approach for importing such a large amount of triples into the Entityhub? And furthermore, could you please explain, if there's any difference between using a Managed Site and the Entityhub itself? From what I understood from the documentation, the only advantage of a Managed Site is the fact that it can be used to separate multiple vocabularies from each other. Is there any other difference? Any help is much appreciated! Best regards, Marvin Luchs