[ 
https://issues.apache.org/jira/browse/NUTCH-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832848#comment-16832848
 ] 

ASF GitHub Bot commented on NUTCH-2706:
---------------------------------------

sebastian-nagel commented on pull request #453: NUTCH-2706 NUTCH-2650 
-addBinaryContent -base64 flag can cause "Strin…
URL: https://github.com/apache/nutch/pull/453
 
 
   …g length must be a multiple of four" error in IndexingJob
   
   - use conversion to base64 encoding which works for various versions of the 
commons-codec libary (1.4 and 1.11) and does never return a chunked string
   
   Successfully tested with options `-addBinaryContent -base64` both in local 
and pseudo-distributed mode.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> -addBinaryContent flag can cause "String length must be a multiple of four" 
> error in IndexingJob
> ------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2706
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2706
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.15
>         Environment: Solr:7.3.1
> Nutch: 1.15
>            Reporter: Prajeeth Emanuel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.16
>
>
> When using the following crawling command:
> bin/crawl -i -s /user/xxxx/seed /user/xxxx/test-crawl-8 3 
> with the index command in the crawl script with -addBinaryContent and -base64.
> The error I get is:
> 2019-04-04 04:10:43,702 svnNumber= clientHw="" userId="" actionKpi="" [main] 
> WARN org.apache.hadoop.mapred.YarnChild - Exception running child : 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: ERROR: 
> [doc=73ad5e05e49054efa258e7c54ae9b9ee] Error adding field 
> 'binaryContent'='PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5nPSJlbiI+DQo8aGVhZD4NCgk8bWV0YSBodHRwLWVx...
>  
> ...
>  
> msg=String length must be a multiple of four. at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:559)
>  at  at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:251) 
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:47)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
>  at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>  
> I see this https://issues.apache.org/jira/browse/NUTCH-2186 as well. Opening 
> a new ticket as mentioned in the comments because I have a different 
> environment.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to