[
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852568#comment-16852568
]
ASF GitHub Bot commented on DRILL-7258:
---------------------------------------
paul-rogers commented on pull request #1802: DRILL-7258: Remove field width
limit for text reader
URL: https://github.com/apache/drill/pull/1802
The V2 text reader enforced a limit of 64K characters when using
column headers, but not when using the columns[] array. The V3 reader
enforced the 64K limit in both cases.
This patch removes the limit in both cases. The limit now is the
16MB vector size limit. With headers, no one column can exceed 16MB.
With the columns[] array, no one row can exceed 16MB. (The 16MB
limit is set by the Netty memory allocator.)
Added an "appendBytes()" method to the scalar column writer which adds
additional bytes to those already written for a specific column or
array element value. The method is implemented for VarChar, Var16Char
and VarBinary vectors. It throws an exception for all other types.
When used with a type conversion shim, the appendBytes() method throws
an exception. This should be OK because, the previous setBytes() should
have failed because a huge value is not acceptable for numeric or date
types conversions.
Added unit tests of the append feature, and for the append feature in
the batch overflow case (when appending bytes causes the vector or
batch to overflow.) Also added tests to verify the lack of column width
limit with the text reader, both with and without headers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Text V3 Reader] Unsupported operation error is thrown when select a column
> with a long string
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-7258
> URL: https://issues.apache.org/jira/browse/DRILL-7258
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.16.0
> Reporter: Anton Gozhiy
> Assignee: Paul Rogers
> Priority: Major
> Labels: arina
> Fix For: 1.17.0
>
> Attachments: 100000.tbl
>
>
> *Data:*
> 100000.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/100000.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
> (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
>
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
>
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
>
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
>
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
>
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
>
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
>
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
>
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
> org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
>
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> .......():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> .......():0
> {noformat}
> *Note:* works fine with v2 reader.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)