[
https://issues.apache.org/jira/browse/NUTCH-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775661#comment-16775661
]
ASF GitHub Bot commented on NUTCH-2696:
---------------------------------------
sebastian-nagel commented on pull request #440: NUTCH-2696 Nutch SegmentReader
does not dump non-ASCII characters with Hadoop 3.x
URL: https://github.com/apache/nutch/pull/440
Open streams in SegmentReader using fixed UTF-8 encoding.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x
> ----------------------------------------------------------------------
>
> Key: NUTCH-2696
> URL: https://issues.apache.org/jira/browse/NUTCH-2696
> Project: Nutch
> Issue Type: Bug
> Components: segment
> Environment: Hadoop version : 3.0.0 (CDH 6.1)
> Nutch : 1.15
> Mode : distributed mode
> Reporter: Laurent Hervaud
> Priority: Major
>
> All Nutch tasks work properly with Hadoop 3.x. (except SegmentReader)
> SegmentReader with -get option work fine.
> SegmentReader with -dump option replace non-ascii character by ?
> Exemple url : [http://www.wikipedia.fr/index.php]
>
> {code:java}
> command : ./runtime/deploy/bin/nutch readseg -dump
> /user/nutch/crawl1.15/segments/20190221093756 /tmp/dump1.15 -nocontent
> -nogenerate -noparse -noparsedata
> ParseText::
> Wikipedia.fr - Portail de recherche sur les projets Wikim?dia
> Chercher sur Wikip?dia en fran?ais
> L?encyclop?die librement r?utilisable que chacun peut am?liorer.
> {code}
>
>
> {code:java}
> command : ./runtime/deploy/bin/nutch readseg -get
> /user/nutch/crawl1.15/segments/20190221093756
> http://www.wikipedia.fr/index.php -nocontent -nogenerate -noparse -noparsedata
> ParseText::
> Wikipedia.fr - Portail de recherche sur les projets Wikimédia
> Chercher sur Wikipédia en français
> L’encyclopédie librement réutilisable que chacun peut améliorer.
> {code}
>
> I try to build with hadoop 3.0.0 dependencies in ivy.xml but i have the same
> result
> It's work fine in local mode.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)