[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979804#comment-14979804
 ] 

ASF subversion and git services commented on ASTERIXDB-1102:
------------------------------------------------------------

Commit 742aba85e00033d4561e194358d0bcd53c775b3f in incubator-asterixdb's branch 
refs/heads/master from [~javierjia]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-asterixdb.git;h=742aba8 ]

ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray

This patch is to change the encoding format that stores the length value
of
the variable length type (e.g. String, ByteArray) from fix-size encoding
(2bytes) to variable-size encoding ( 1 to 5bytes)
It will solve the issue 1102 to enable us to store a String that longer
than 64K. Also for the common case of storing the short string ( <=
127), it will save one byte per string.

Some important changes include:
1. The UTF8StringSerDer and ByteArraySerDer is not Singleton instance
any more. I need some state to speedup the serialization and avoid the
object creatation. Luckily, 99% percent of Serializer were used as
factory way. The other 1% has been fixed.

A separate Test support, the ExcutionTest now can produce the only.xml
which stores the previous failed runtime test.xml. It can speedup the
debug process.

Change-Id: I41fff780f5c071742ef10129d83c8f945d5886d7
Reviewed-on: https://asterix-gerrit.ics.uci.edu/450
Tested-by: Jenkins <[email protected]>
Reviewed-by: Jianfeng Jia <[email protected]>


> Add a TEXT-like data type to enable storing the string longer than 64K
> ----------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1102
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1102
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: Data Model
>            Reporter: Jianfeng Jia
>            Assignee: Jianfeng Jia
>
> The current "String" type can't handle the string longer than 64K. The first 
> reason is that we are using java DataOutputStream to serialize it. It stores 
> the length using two bytes. However, it should serve the basic requirement 
> for "string" type.
> We need a special TEXT-like datatype to deal with long strings. And probably 
> add some search-related functionalities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to