GitHub user zentol opened a pull request:

    https://github.com/apache/incubator-flink/pull/4

    Serialized String comparison, Unicode support

    The StringComparator now works on serialized data.
    
    To this end new string read/write/copy/compare methods were introduced, 
which use a variable-length encoding for the characters.
    
    key-points:
    
        The most significant bits are written/read first.
        The first 2 bits of the character are used to encode the size of the 
character.
        A character is at most 3 Bytes big.
    
    Additionally, the StringSerializer now has full unicode support. i couldn't 
find a unicode character that uses more than 22 bits, as such 3 Bytes should be 
sufficient.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/incubator-flink 
string-serialization-comparator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/4.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4
    
----
commit 6b3b3723c96ded4ab59f95902a90d6a67821a677
Author: zentol <[email protected]>
Date:   2014-06-10T19:28:34Z

    Serialized String comparison, Unicode support

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to