[
https://issues.apache.org/jira/browse/ANY23-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Brende updated ANY23-415:
------------------------------
Description:
Since the NTriplesExtractorFactory includes a content type of "text/plain",
this causes every plain text file to be processed by the NTriplesExtractor,
which in turn causes huge numbers of completely unnecessary fatal issues being
sent to the extraction report.
In my crawls, this mostly occurs for all the "humans.txt" files encountered.
While this isn't a hugely serious bug, it is quite irritating as it does really
clutter up my logs.
Note: the NQuadsExtractorFactory (which can parse all the same documents as
NTriples) does *not* include a content type of "text/plain".
was:
Since the NTriplesExtractorFactory includes a content type of "text/plain",
this causes every plain text file to be processed by the NTriplesExtractor,
which in turn causes huge numbers of completely unnecessary fatal issues being
sent to the extraction report.
In my crawls, this mostly occurs for all the "humans.txt" files encountered.
While this isn't a hugely serious bug, it is quite irritating as it does really
clutter up my logs.
> NTriplesExtractor tries all text/plain files, causing numerous fatal issues
> ---------------------------------------------------------------------------
>
> Key: ANY23-415
> URL: https://issues.apache.org/jira/browse/ANY23-415
> Project: Apache Any23
> Issue Type: Bug
> Components: extractors
> Affects Versions: 2.3
> Reporter: Hans Brende
> Priority: Minor
> Fix For: 2.3
>
>
> Since the NTriplesExtractorFactory includes a content type of "text/plain",
> this causes every plain text file to be processed by the NTriplesExtractor,
> which in turn causes huge numbers of completely unnecessary fatal issues
> being sent to the extraction report.
> In my crawls, this mostly occurs for all the "humans.txt" files encountered.
> While this isn't a hugely serious bug, it is quite irritating as it does
> really clutter up my logs.
>
> Note: the NQuadsExtractorFactory (which can parse all the same documents as
> NTriples) does *not* include a content type of "text/plain".
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)