[ 
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852568#comment-16852568
 ] 

ASF GitHub Bot commented on DRILL-7258:
---------------------------------------

paul-rogers commented on pull request #1802: DRILL-7258: Remove field width 
limit for text reader
URL: https://github.com/apache/drill/pull/1802
 
 
   The V2 text reader enforced a limit of 64K characters when using
   column headers, but not when using the columns[] array. The V3 reader
   enforced the 64K limit in both cases.
   
   This patch removes the limit in both cases. The limit now is the
   16MB vector size limit. With headers, no one column can exceed 16MB.
   With the columns[] array, no one row can exceed 16MB. (The 16MB
   limit is set by the Netty memory allocator.)
   
   Added an "appendBytes()" method to the scalar column writer which adds
   additional bytes to those already written for a specific column or
   array element value. The method is implemented for VarChar, Var16Char
    and VarBinary vectors. It throws an exception for all other types.
   
   When used with a type conversion shim, the appendBytes() method throws
   an exception. This should be OK because, the previous setBytes() should
   have failed because a huge value is not acceptable for numeric or date
   types conversions.
   
   Added unit tests of the append feature, and for the append feature in
   the batch overflow case (when appending bytes causes the vector or
   batch to overflow.) Also added tests to verify the lack of column width
   limit with the text reader, both with and without headers.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Text V3 Reader] Unsupported operation error is thrown when select a column 
> with a long string
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7258
>                 URL: https://issues.apache.org/jira/browse/DRILL-7258
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Anton Gozhiy
>            Assignee: Paul Rogers
>            Priority: Major
>              Labels: arina
>             Fix For: 1.17.0
>
>         Attachments: 100000.tbl
>
>
> *Data:*
> 100000.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/100000.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
>     org.apache.drill.common.exceptions.UserException$Builder.build():630
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
>     
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
>     
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
>     org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
>     org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
>     org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
>     org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
>     org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
>     org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
>     
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
>     org.apache.drill.exec.record.AbstractRecordBatch.next():126
>     org.apache.drill.exec.record.AbstractRecordBatch.next():116
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>     org.apache.drill.exec.record.AbstractRecordBatch.next():186
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>     .......():0
>     org.apache.hadoop.security.UserGroupInformation.doAs():1746
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     .......():0
> {noformat}
> *Note:* works fine with v2 reader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to