[GitHub] [any23] HansBrende commented on issue #104: Any23 295: Implement ability to use librdfa

GitBox Thu, 12 Sep 2019 18:46:50 -0700

HansBrende commented on issue #104: Any23 295: Implement ability to use librdfa
URL: https://github.com/apache/any23/pull/104#issuecomment-531068423
 
 
   @lewismc My first thought is: if the performance of this module is not as 
good as that of our current implementation, then in its current form, what is 
the added value?
   
   My second thought is: the benchmarks do not test the Any23 `Extractor` 
wrappers around these rdf4j parsers, only the underlying parsers themselves. 
However, in Any23's `BaseRDFExtractor`, due to a lot of bugs in the semargl 
html parser, we had to preprocess the input stream using jsoup before passing 
it into the underlying parser. I am curious as to whether or not the `librdfa` 
parser would have any of those same html parsing bugs. If _not_, if I can take 
the preprocessing logic out of `BaseRDFExtractor` and move it to the semargl 
parser specifically, and **if** the librdfa parser can still pass the entire 
test suite without using the jsoup-preprocessed stream, then there would be a 
much better case for including it (as its performance would then likely eclipse 
our current rdfa performance without the preprocessing overhead).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [any23] HansBrende commented on issue #104: Any23 295: Implement ability to use librdfa

Reply via email to