Hi Ye, Please feel free to comment fully on any issue you find onthe Nutch Jira. If you find other/additional bugs or improvements when are not already opened on the Jira instance then please feel free to open ones once you are sure they are not duplicates and/or can be resolved via the user@ list.
As Markus has explained on NUTCH-968 if you could check out trunk and provide a patch against it, this would be excellent. Test cases are also very welcome as well. Thank you very much for your input. Lewis On Fri, Aug 31, 2012 at 3:15 PM, Ye T Thet <[email protected]> wrote: > Hi Folks, > > There is an issue with protocol-file plugin in while fetching files that > contain CJK characters in the file name. JIRA Nutch 968 > > After I checked the code, I discovered that the problem due to the encoding > in the file name while fetching the directory. After changing couple of > lines as discussed in the JIRA Nutch 968, the issue is resolved. > > I see the issue is still open in JIRA and the latest nutch release has no > fix in it yet. I like to discuss further on the solution I have here in the > list and submit the patch once fine. > > Anyone in for it? I can elaborate further more on the fix. > > Cheers, > > Ye > > > > -- Lewis

