[
https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058586#comment-13058586
]
Matthew Pocock commented on CODEC-125:
--------------------------------------
I have renamed the bmpm package to bm. Do you want me to move
BeiderMoreseEncoder into the bm package? I put it into the language package
because that is where all the other encoders are, and I presume having them in
that package allows them to be automagically imported by things like the lucene
configuration files. However, I put all the other stuff in bm because it is
specific to the bmpm method and is worth having publicly visible as you can do
some custom things with it that are not reasonable to expose through the codec.
It also has no relevance to the other codecs so I didn't want to clutter up the
primary package.
So, I've applied the patch on ubuntu to a clean checkout of commons-codec. This
failed to pass all tests because all empty files in the patch failed to
generate empty files in the source tree. I did not know that patch behaved like
this. Anyway, I've put a comment in every otherwise empty file and now on
ubuntu the patch applies cleanly to commons-codec and results in a project that
builds without errors.
Then I've made a clean checkout of commons-codec on windows 7 and applied the
revised patch using TortoiseSvn. When I build this, I get errors. It looks like
windows is mangling the unicode text files during application of the patch. You
said that you where seeing '?' characters in the text files. There are no such
characters in the original text or in the patch file, so I think this is
indicating that the text has got mangled during patch application. After
applying the patch on windows using tortoiseSvn, in lang.txt I see ? for each
cyrillic, greek, hebrew and arabic characters. In the original file on windows
I see various symbols. When I look at the patch file directly in windows, I see
symbols. I've looked at lang.txt in the TortoiseMerge tool, and regardless of
what I set the default encoding to, the interesting unicode chars are mangled
to '?'.
I've run out of ideas about how to apply the patch on windows. What tool where
you using to apply the patch? Can you tell it that the patch file is UTF8?
> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
> Key: CODEC-125
> URL: https://issues.apache.org/jira/browse/CODEC-125
> Project: Commons Codec
> Issue Type: New Feature
> Reporter: Matthew Pocock
> Priority: Minor
> Attachments: bm-gg.diff, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the
> commons-codec svn trunk. I would like to contribute this to commons-codec.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira