[
https://issues.apache.org/jira/browse/ANY23-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated ANY23-37:
--------------------------------------
Attachment: ANY23-37-v2.patch
OK so this patch also removes the DSIutils and fastutils libraries from the
basic-crawler pom.xml.
There will still be the problem with the compile time error. This is because
getHTML() is deprecated in the newer version of Crawler4j. Around lines 89-98,
we should instead be specifying something like:
{code}
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
Crawler.super.performExtraction(
new StringDocumentSource(
html,
pageURL
)
);
}
{code}
I got totally sidetracked from this after last weekend so apologies about the
half baked patch :|
> LGPL'ed components cannot be included in distribution packages
> --------------------------------------------------------------
>
> Key: ANY23-37
> URL: https://issues.apache.org/jira/browse/ANY23-37
> Project: Apache Any23
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Simone Tripodi
> Priority: Critical
> Fix For: 0.7.0
>
> Attachments: ANY23-37-v2.patch, ANY23-37.patch
>
>
> While reviewing dependencies license, I noticed that the
> it.unimi.dsi:dsiutils:2.0.1 transitive dependency is released under LGPL
> release, so it cannot be included in the non-maven binary archives.
> A first turnaround solution could be avoiding it is included and reporting it
> in the README.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira