[
https://issues.apache.org/jira/browse/DRILL-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821571#comment-16821571
]
Paul Rogers commented on DRILL-7020:
------------------------------------
The "write" part of the message means "write to the value vector", so read this
as "tried to write too large a value to the column's value vector."
The request here seems to be to modify the "compliant" text reader to avoid the
use of the fixed-size buffer for column values in order to allow column values
larger than 64K.
The simple way to do this is to use a Java string for the value, or to
reallocate the column buffer as needed for ever larger values.
A more elegant way, now that the compliant reader is on the row set framework,
is to implement an "append" operation which will take a buffer and append it to
the value (if any) already in the column. This will allow reading large values
without having to allocate large intermediate buffers.
> big varchar doesn't work with extractHeader=true
> ------------------------------------------------
>
> Key: DRILL-7020
> URL: https://issues.apache.org/jira/browse/DRILL-7020
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.15.0
> Reporter: benj
> Priority: Major
>
> with a TEST file of csv type like
> {code:java}
> col1,col2
> w,x
> ...y...,z
> {code}
> where ...y... is > 65536 characters string (let say 66000 for example)
> SELECT with +*extractHeader=false*+ are OK
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',',
> extractHeader => false));
> col1 | col2
> +---------+------
> | w | x
> | ...y... | z
> {code}
> But SELECT with +*extractHeader=true*+ gives an error
> {code:java}
> SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',',
> extractHeader => true));
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column
> columnIndex 1
> Limit 65536
> Fragment 0:0
> {code}
> Note that is possible to use extractHeader=false with skipFirstLine=true but
> in this case it's not possible to automatically get columns names.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)