[
https://issues.apache.org/jira/browse/DRILL-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756537#comment-16756537
]
Paul Rogers commented on DRILL-7020:
------------------------------------
The size limitation is hard-coded into the "complaint" text reader, as you
noted. I'm not sure the limit is necessary. Drill uses a 4-byte offset vector
to track VARCHAR values within a VARCHAR vector. Might be as easy as removing
the size check.
> big varchar doesn't work with extractHeader=true
> ------------------------------------------------
>
> Key: DRILL-7020
> URL: https://issues.apache.org/jira/browse/DRILL-7020
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.15.0
> Reporter: benj
> Priority: Major
>
> with a TEST file of csv type like
> {code:java}
> col1,col2
> w,x
> ...y...,z
> {code}
> where ...y... is > 65536 characters string (let say 66000 for example)
> SELECT with +*extractHeader=false*+ are OK
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',',
> extractHeader => false));
> col1 | col2
> +---------+------
> | w | x
> | ...y... | z
> {code}
> But SELECT with +*extractHeader=true*+ gives an error
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',',
> extractHeader => true));
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
> columnIndex 1
> Limit 65536
> Fragment 0:0
> {code}
> Note that is possible to use extractHeader=false with skipFirstLine=true but
> in this case it's not possible to automatically get columns names.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)