[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876649#comment-14876649
 ] 

Ian Maxon commented on ASTERIXDB-1102:
--------------------------------------

Jianfeng, Taewoo and I discussed a little about this just today, and I think 
the overall consensus is that this is actually more complex than it seems. 
There's just a lot of side effects. 

For making two types for backward compatibility at all
- We have to be sure that comparisons between these two types work fine
- One also can't index on a field of mixed type, so if someone had an index on 
string, and then we changed to text by default, that'd break things
- Switching between the two types at load time is similarly tricky. 

For making the default a 4-byte fixed length encoding
- This will seriously hurt the performance of inverted indexes, because they 
use string type to hold tokens and grams. Variable(ala UTF-8) length encoding 
would solve this and shouldn't be too ugly.  

> Add a TEXT-like data type to enable storing the string longer than 64K
> ----------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1102
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1102
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: Data Model
>            Reporter: Jianfeng Jia
>            Assignee: Jianfeng Jia
>
> The current "String" type can't handle the string longer than 64K. The first 
> reason is that we are using java DataOutputStream to serialize it. It stores 
> the length using two bytes. However, it should serve the basic requirement 
> for "string" type.
> We need a special TEXT-like datatype to deal with long strings. And probably 
> add some search-related functionalities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to