[
https://issues.apache.org/jira/browse/ANY23-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216527#comment-13216527
]
Lewis John McGibbney edited comment on ANY23-37 at 2/25/12 8:18 PM:
--------------------------------------------------------------------
OK so this patch also removes the DSIutils and fastutils libraries from the
basic-crawler pom.xml.
There will still be the problem with the compile time error. This is because
getHTML() is deprecated in the newer version of Crawler4j.
Around lines 89-98 of Crawler.java [0], instead of making the call to
page.getHTML() (line 96), we should instead be specifying something like:
{code}
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
Crawler.super.performExtraction(
new StringDocumentSource(
html,
pageURL
)
);
}
{code}
I got totally sidetracked from this after last weekend so apologies about the
half baked patch. More details on this can be seen @ [1]
[0]
https://svn.apache.org/viewvc/incubator/any23/trunk/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java?view=markup
[1] http://code.google.com/p/crawler4j/
was (Author: lewismc):
OK so this patch also removes the DSIutils and fastutils libraries from the
basic-crawler pom.xml.
There will still be the problem with the compile time error. This is because
getHTML() is deprecated in the newer version of Crawler4j. Around lines 89-98,
we should instead be specifying something like:
{code}
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
Crawler.super.performExtraction(
new StringDocumentSource(
html,
pageURL
)
);
}
{code}
I got totally sidetracked from this after last weekend so apologies about the
half baked patch :|
> LGPL'ed components cannot be included in distribution packages
> --------------------------------------------------------------
>
> Key: ANY23-37
> URL: https://issues.apache.org/jira/browse/ANY23-37
> Project: Apache Any23
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Simone Tripodi
> Priority: Critical
> Fix For: 0.7.0
>
> Attachments: ANY23-37-v2.patch, ANY23-37.patch
>
>
> While reviewing dependencies license, I noticed that the
> it.unimi.dsi:dsiutils:2.0.1 transitive dependency is released under LGPL
> release, so it cannot be included in the non-maven binary archives.
> A first turnaround solution could be avoiding it is included and reporting it
> in the README.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira