[
https://issues.apache.org/jira/browse/ASTERIXDB-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876649#comment-14876649
]
Ian Maxon commented on ASTERIXDB-1102:
--------------------------------------
Jianfeng, Taewoo and I discussed a little about this just today, and I think
the overall consensus is that this is actually more complex than it seems.
There's just a lot of side effects.
For making two types for backward compatibility at all
- We have to be sure that comparisons between these two types work fine
- One also can't index on a field of mixed type, so if someone had an index on
string, and then we changed to text by default, that'd break things
- Switching between the two types at load time is similarly tricky.
For making the default a 4-byte fixed length encoding
- This will seriously hurt the performance of inverted indexes, because they
use string type to hold tokens and grams. Variable(ala UTF-8) length encoding
would solve this and shouldn't be too ugly.
> Add a TEXT-like data type to enable storing the string longer than 64K
> ----------------------------------------------------------------------
>
> Key: ASTERIXDB-1102
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1102
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: Data Model
> Reporter: Jianfeng Jia
> Assignee: Jianfeng Jia
>
> The current "String" type can't handle the string longer than 64K. The first
> reason is that we are using java DataOutputStream to serialize it. It stores
> the length using two bytes. However, it should serve the basic requirement
> for "string" type.
> We need a special TEXT-like datatype to deal with long strings. And probably
> add some search-related functionalities.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)