[
https://issues.apache.org/jira/browse/DRILL-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733940#comment-16733940
]
Abhishek Ravi commented on DRILL-6918:
--------------------------------------
The PR fixes the issue for empty topic partitions, however, as mentioned
earlier, \{{NumberFormatException}} still exists on non-empty topic partitions
for a \{{select *}} query where predicate field does not exist.
In this case, \{{ensureAtLeastOneField}} / \{{Scan}} does not add the column
but \{{Project}} does. Specifically, \{{ProjectRecordBatch.setupNewSchema}}
adds the column to the container as NullableInteger. The downstream Filter
operator sees this column and thus hits the \{{NumberFormatException}}.
Interestingly, when specific columns are projected (but the predicate still
contains non-existing field), the exception is not hit. First thing I noticed
is that the physical plan is different from \{{select *}} case. There is no
Project between Scan and Filter. Hence the non-existing column is not added to
the schema by the time Filter is processing the records. Filter considers the
field as Nullable VarBinary (this is different from Project), which is why the
issue is not seen.
h3. Physical plan for a select * query
{noformat}
select * from `topic2` where somethingelse <> 'abc'
{noformat}
{noformat}
00-00 Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 3.0,
cumulative cost = {27.3 rows, 69.3 cpu, 6144.0 io, 0.0 network, 0.0 memory}, id
= 422
00-01 Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount =
3.0, cumulative cost = {27.0 rows, 69.0 cpu, 6144.0 io, 0.0 network, 0.0
memory}, id = 421
00-02 Project(T0¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T0¦¦**):
rowcount = 3.0, cumulative cost = {24.0 rows, 66.0 cpu, 6144.0 io, 0.0 network,
0.0 memory}, id = 420
00-03 SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR
T0¦¦**, ANY somethingelse): rowcount = 3.0, cumulative cost = {21.0 rows, 63.0
cpu, 6144.0 io, 0.0 network, 0.0 memory}, id = 419
00-04 Filter(condition=[<>($1, 'abc')]) : rowType =
RecordType(DYNAMIC_STAR T0¦¦**, ANY somethingelse): rowcount = 3.0, cumulative
cost = {18.0 rows, 60.0 cpu, 6144.0 io, 0.0 network, 0.0 memory}, id = 418
00-05 Project(T0¦¦**=[$0], somethingelse=[$1]) : rowType =
RecordType(DYNAMIC_STAR T0¦¦**, ANY somethingelse): rowcount = 6.0, cumulative
cost = {12.0 rows, 24.0 cpu, 6144.0 io, 0.0 network, 0.0 memory}, id = 417
00-06 Scan(table=[[kafka2, topic2]], groupscan=[KafkaGroupScan
[KafkaScanSpec=KafkaScanSpec [topicName=topic2], columns=[`**`,
`somethingelse`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY somethingelse):
rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 6144.0 io, 0.0 network,
0.0 memory}, id = 416
{noformat}
h2. Physical plan for select with specific fields.
{noformat}
select LastName from `topic2` where somethingelse <> 'abc'
{noformat}
{noformat}
00-00 Screen : rowType = RecordType(ANY LastName): rowcount = 3.0,
cumulative cost = {21.3 rows, 57.3 cpu, 6144.0 io, 0.0 network, 0.0 memory}, id
= 225
00-01 Project(LastName=[$0]) : rowType = RecordType(ANY LastName):
rowcount = 3.0, cumulative cost = {21.0 rows, 57.0 cpu, 6144.0 io, 0.0 network,
0.0 memory}, id = 224
00-02 Project(LastName=[$1]) : rowType = RecordType(ANY LastName):
rowcount = 3.0, cumulative cost = {18.0 rows, 54.0 cpu, 6144.0 io, 0.0 network,
0.0 memory}, id = 223
00-03 SelectionVectorRemover : rowType = RecordType(ANY somethingelse,
ANY LastName): rowcount = 3.0, cumulative cost = {15.0 rows, 51.0 cpu, 6144.0
io, 0.0 network, 0.0 memory}, id = 222
00-04 Filter(condition=[<>($0, 'abc')]) : rowType = RecordType(ANY
somethingelse, ANY LastName): rowcount = 3.0, cumulative cost = {12.0 rows,
48.0 cpu, 6144.0 io, 0.0 network, 0.0 memory}, id = 221
00-05 Scan(table=[[kafka2, topic2]], groupscan=[KafkaGroupScan
[KafkaScanSpec=KafkaScanSpec [topicName=topic2], columns=[`somethingelse`,
`LastName`]]]) : rowType = RecordType(ANY somethingelse, ANY LastName):
rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 6144.0 io, 0.0 network,
0.0 memory}, id = 220
{noformat}
> Querying empty topics fails with "NumberFormatException"
> --------------------------------------------------------
>
> Key: DRILL-6918
> URL: https://issues.apache.org/jira/browse/DRILL-6918
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Kafka
> Affects Versions: 1.14.0
> Reporter: Abhishek Ravi
> Assignee: Abhishek Ravi
> Priority: Minor
> Fix For: 1.16.0
>
>
> Queries with filter conditions fail with {{NumberFormatException}} when
> querying empty topics.
> {noformat}
> 0: jdbc:drill:drillbit=10.10.100.189> select * from `topic2` where Field1 =
> 'abc';
> Error: SYSTEM ERROR: NumberFormatException: abc
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: a0718456-c053-4820-9bd8-69c683598344 on qa-node189.qa.lab:31010]
> (state=,code=0)
> {noformat}
>
> *Logs:*
> {noformat}
> 2018-12-20 22:36:34,576 [23e3760d-7d23-5489-e2fb-6daf383053ee:foreman] INFO
> o.a.drill.exec.work.foreman.Foreman - Query text for query with id
> 23e3760d-7d23-5489-e2fb-6daf383053ee issued by root: select * from `topic2`
> where Field1 = 'abc'
> 2018-12-20 22:36:35,134 [23e3760d-7d23-5489-e2fb-6daf383053ee:foreman] INFO
> o.a.d.e.s.k.KafkaPushDownFilterIntoScan - Partitions ScanSpec before
> pushdown: [KafkaPartitionScanSpec [topicName=topic2, partitionId=2,
> startOffset=0, endOffset=0], KafkaPartitionScanSpec [topicName=topic2,
> partitionId=1, startOffset=0, endOffset=0], KafkaPartitionScanSpec
> [topicName=topic2, partitionId=0, startOffset=0, endOffset=0]]
> 2018-12-20 22:36:35,170 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.s.k.KafkaScanBatchCreator - Number of record readers initialized : 3
> 2018-12-20 22:36:35,171 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 23e3760d-7d23-5489-e2fb-6daf383053ee:0:0: State change requested
> AWAITING_ALLOCATION --> RUNNING
> 2018-12-20 22:36:35,172 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.w.f.FragmentStatusReporter -
> 23e3760d-7d23-5489-e2fb-6daf383053ee:0:0: State to report: RUNNING
> 2018-12-20 22:36:35,173 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.s.k.d.MessageReaderFactory - Initialized Message Reader :
> JsonMessageReader[jsonReader=null]
> 2018-12-20 22:36:35,177 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.store.kafka.MessageIterator - Start offset of topic2:2 is - 0
> 2018-12-20 22:36:35,177 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.s.kafka.KafkaRecordReader - Last offset processed for topic2:2 is - 0
> 2018-12-20 22:36:35,177 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.s.kafka.KafkaRecordReader - Total time to fetch messages from
> topic2:2 is - 0 milliseconds
> 2018-12-20 22:36:35,178 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] WARN
> o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path
> `Field1`, returning null instance.
> 2018-12-20 22:36:35,191 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 23e3760d-7d23-5489-e2fb-6daf383053ee:0:0: State change requested RUNNING -->
> FAILED
> 2018-12-20 22:36:35,191 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] ERROR
> o.a.d.e.physical.impl.BaseRootExec - Batch dump started: dumping last 2
> failed batches
> 2018-12-20 22:36:35,191 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] ERROR
> o.a.d.e.p.i.s.RemovingRecordBatch -
> RemovingRecordBatch[container=org.apache.drill.exec.record.VectorContainer@3ce6a91e[recordCount
> = 0, schemaChanged = true, schema = null, wrappers = [], ...], state=FIRST,
> copier=null]
> 2018-12-20 22:36:35,191 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] ERROR
> o.a.d.e.p.i.filter.FilterRecordBatch -
> FilterRecordBatch[container=org.apache.drill.exec.record.VectorContainer@2057ff66[recordCount
> = 0, schemaChanged = true, schema = null, wrappers =
> [org.apache.drill.exec.vector.NullableIntVector@32edcdf2[field = [`T4¦¦**`
> (INT:OPTIONAL)], ...],
> org.apache.drill.exec.vector.NullableIntVector@3a5bf582[field = [`Field1`
> (INT:OPTIONAL)], ...]], ...], selectionVector2=[SV2: recs=0 - ], filter=null,
> popConfig=org.apache.drill.exec.physical.config.Filter@1d69df75]
> 2018-12-20 22:36:35,191 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] ERROR
> o.a.d.e.physical.impl.BaseRootExec - Batch dump completed.
> 2018-12-20 22:36:35,192 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 23e3760d-7d23-5489-e2fb-6daf383053ee:0:0: State change requested FAILED -->
> FINISHED
> 2018-12-20 22:36:35,194 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] ERROR
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NumberFormatException: abc
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: a0718456-c053-4820-9bd8-69c683598344 on qa-node189.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> NumberFormatException: abc
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: a0718456-c053-4820-9bd8-69c683598344 on qa-node189.qa.lab:31010]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
> ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:364)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_181]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_181]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
> Caused by: java.lang.NumberFormatException: abc
> at
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:96)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:121)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.FiltererGen29.doSetup(FilterTemplate2.java:83)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.FiltererGen29.setup(FilterTemplate2.java:52)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:196)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:112)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:101)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_181]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_181]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> ~[hadoop-common-2.7.0-mapr-1808.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> ... 4 common frames omitted
> 2018-12-20 22:36:35,209 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment
> 23e3760d-7d23-5489-e2fb-6daf383053ee:0:0 manager is not found in the work bus.
> 2018-12-20 22:36:35,231 [23e3760d-7d23-5489-e2fb-6daf383053ee:frag:0:0] WARN
> o.a.d.e.w.f.QueryStateProcessor - Dropping request to move to COMPLETED state
> as query is already at FAILED state (which is terminal).
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)