[ 
https://issues.apache.org/jira/browse/ANY23-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216527#comment-13216527
 ] 

Lewis John McGibbney edited comment on ANY23-37 at 2/25/12 8:18 PM:
--------------------------------------------------------------------

OK so this patch also removes the DSIutils and fastutils libraries from the 
basic-crawler pom.xml.

There will still be the problem with the compile time error. This is because 
getHTML() is deprecated in the newer version of Crawler4j. 
Around lines 89-98 of Crawler.java [0], instead of making the call to 
page.getHTML() (line 96), we should instead be specifying something like:

{code}
if (page.getParseData() instanceof HtmlParseData) {
       HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
       String html = htmlParseData.getHtml();

       Crawler.super.performExtraction(
                       new StringDocumentSource(
                                       html,
                                       pageURL
                       )
       );
}
{code}  

I got totally sidetracked from this after last weekend so apologies about the 
half baked patch. More details on this can be seen @ [1]

[0] 
https://svn.apache.org/viewvc/incubator/any23/trunk/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java?view=markup
[1] http://code.google.com/p/crawler4j/
                
      was (Author: lewismc):
    OK so this patch also removes the DSIutils and fastutils libraries from the 
basic-crawler pom.xml.

There will still be the problem with the compile time error. This is because 
getHTML() is deprecated in the newer version of Crawler4j. Around lines 89-98, 
we should instead be specifying something like:

{code}
if (page.getParseData() instanceof HtmlParseData) {
       HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
       String html = htmlParseData.getHtml();

       Crawler.super.performExtraction(
                       new StringDocumentSource(
                                       html,
                                       pageURL
                       )
       );
}
{code}  

I got totally sidetracked from this after last weekend so apologies about the 
half baked patch :|
                  
> LGPL'ed components cannot be included in distribution packages
> --------------------------------------------------------------
>
>                 Key: ANY23-37
>                 URL: https://issues.apache.org/jira/browse/ANY23-37
>             Project: Apache Any23
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Simone Tripodi
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: ANY23-37-v2.patch, ANY23-37.patch
>
>
> While reviewing dependencies license, I noticed that the 
> it.unimi.dsi:dsiutils:2.0.1 transitive dependency is released under LGPL 
> release, so it cannot be included in the non-maven binary archives.
> A first turnaround solution could be avoiding it is included and reporting it 
> in the README.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to