[
https://issues.apache.org/jira/browse/ANY23-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302682#comment-14302682
]
Peter Ansell commented on ANY23-248:
------------------------------------
Thanks for looking further into this Souri. If that was the fix, it is
definitely related to classpath searching, not that some libraries were missing
in the Maven dependencies.
It is possible that Sesame is using a Classloader inside of the
Rio.createParser method that doesn't have access to Semargl somehow, in the
context of Hadoop. Currently it is using the actual classes classloader, which
may not have a view of the Semargl jar file/classes in the Hadoop classloader
model:
The parsers are found starting at the code here:
https://bitbucket.org/openrdf/sesame/src/6275c3e0d504df76edb16396c11e67f07c72439c/core/rio/api/src/main/java/org/openrdf/rio/RDFParserRegistry.java?at=2.7.x
Internally the constructor for that class gets down to the following code that
ends up calling RDFParser.class.getClassLoader(), which may be not useful in
your case:
https://bitbucket.org/openrdf/sesame/src/6275c3e0d504df76edb16396c11e67f07c72439c/core/util/src/main/java/info/aduna/lang/service/ServiceRegistry.java?at=2.7.x#cl-45
If there is a more durable method for that process that works on Hadoop then
could you submit a pull request to the Sesame BitBucket and I will review it
there. In particular, it may be possible to use the thread context class loader
that may have more classes in scope at that point, but you would need to do
some experimenting on Hadoop to see what classes it is able to find.
> NTriplesWriter on hadoop : issue with MIME type
> -----------------------------------------------
>
> Key: ANY23-248
> URL: https://issues.apache.org/jira/browse/ANY23-248
> Project: Apache Any23
> Issue Type: Bug
> Affects Versions: 1.1
> Environment: hadoop,linux
> Reporter: Souri
> Priority: Minor
> Fix For: 1.2
>
>
> I am trying to create n-triples from an html string. I am using the following
> code to do it:
> StringDocumentSource documentSource = new StringDocumentSource(html, null);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> final NTriplesWriter tripleHandler = new NTriplesWriter(out);
> Any23 runner = new Any23();
>
> runner.extract(documentSource,tripleHandler);
> tripleHandler.close();
> String result = out.toString("us-ascii");
> return result;
> This is giving me the error :
> java.lang.NullPointerException
> at
> org.apache.any23.extractor.SingleDocumentExtraction.filterExtractorsByMIMEType(SingleDocumentExtraction.java:421)
> at
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:223)
> at org.apache.any23.Any23.extract(Any23.java:298)
> at org.apache.any23.Any23.extract(Any23.java:433)
> I am running this in hadoop. When I run locally with a single file it works,
> but doesn't work when run on hadoop.
> Can someone please tell me how to go about this issue?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)