GitHub user lewismc opened a pull request:
https://github.com/apache/any23/pull/24
Initial move towards addressing ANY23-280 Refactor ContentExtractor to
improve extraction flexibility
Hi Folks,
This is an initial crack at addressing
https://issues.apache.org/jira/browse/ANY23-280
Essentially, the main API difference is the complete removal of ```public
interface ContentExtractor extends Extractor<InputStream>``` from the Extractor
interface in the api module.
This patch has a long way to go with numerous failing tests however I
wanted to post it for feedback.
Although Any23 still builds with -DskipTests, without that flag the failing
tests are as follows
```
Results :
Failed tests:
Any23Test.testDemoCodeSnippet1:201
Any23Test.testN3Detection1:92->assertDetection:661
Any23Test.testN3Detection2:97->assertDetection:661
Any23Test.testTTLDetection:87->assertDetection:661
RoverTest.testRunMultiURLs:104->runWithMultiSourcesAndVerify:134
Unexpected number of statements.
Tests in error:
Any23Test.testProgrammaticExtraction:279 » NullPointer
CSVExtractorTest.testExtractionCommaSeparated:49->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
CSVExtractorTest.testExtractionEmptyValue:112->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
CSVExtractorTest.testExtractionSemicolonSeparated:64->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
CSVExtractorTest.testExtractionTabSeparated:79->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
CSVExtractorTest.testTypeManagement:94->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
» NullPointer
RDFaExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
» NullPointer
Tests run: 403, Failures: 5, Errors: 8, Skipped: 11
```
You will see that some of the tests concern
https://issues.apache.org/jira/browse/ANY23-267 as well.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lewismc/any23 ANY23-280
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/24.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #24
----
commit 801f2f93967bfd1295700223085eef3f54181517
Author: Lewis John McGibbney <[email protected]>
Date: 2016-04-06T19:44:35Z
Initial move towards addressing ANY23-280 Refactor ContentExtractor to
improve extraction flexibility
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---