[
https://issues.apache.org/jira/browse/DRILL-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937758#comment-16937758
]
daniel kelly commented on DRILL-7388:
-------------------------------------
Unfortunately due to the nature of the data I can't share that - but I can give
an anonymised example:-
{code:java}
using kafka's console tool showed the contents of a a topic:-
bin/kafka-console-consumer.sh --bootstrap-server theserver:9092 --group test
--from-beginning --topic myTopic
{“fld1”:1,”fld2”:”xxx”}
{“fld1”:2,”fld2”:”xxx”}
{“fld1”:3,”fld2”:”xxx”}
{“fld1”:4,”fld2”:”xxx”}
{“fld1”:5,”fld2”:”xxx”}
{“fld1”:6,”fld2”:”xxx”}
using drill before my fix :-
Apache Drill 1.16.0
"A little SQL for your NoSQL."
apache drill (kafka)> show tables ;
+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+--------------+------------------------+
| kafka | myTopic |
| kafka | __consumer_offsets |
+--------------+------------------------+
apache drill (kafka)> select fld1 from `myTopic` ;
+--+
| |
+--+
+--+
No rows selected (7.629 seconds)
After my fix:-
apache drill (kafka)> select fld1 from `myTopic` ;
+------+
| fld1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+------+
6 rows selected (4.345 seconds)
{code}
By examining the drillbit.log output, I could see that the was a partition
containing data - [*topicName=myTopic, partitionId=117, startOffset=0,
endOffset=1]*but results were not being collated. Checking the code seemed to
suggest the problem was with the following line in KafkaRecordReader.java
while (currentOffset < subScanSpec.getEndOffset() - 1 && msgItr.hasNext()) {
which would never retrieve data associated with the single offset - i.e. with
currentOffset = 0 and endOffset = 1 - that condition will never match.
hope that helps illustrate the issue.
> Apache Drill Kafka Storage module fails to return results for partitions
> containing single offset record
> --------------------------------------------------------------------------------------------------------
>
> Key: DRILL-7388
> URL: https://issues.apache.org/jira/browse/DRILL-7388
> Project: Apache Drill
> Issue Type: Bug
> Reporter: daniel kelly
> Priority: Blocker
>
> If a partition only contains one record - e.g.
> [topicName=myTopic, partitionId=117, startOffset=0, endOffset=1]
> no data is returned.
> I fixed this locally with the following code change in contrib/storage-kafka
> :-
> {code:java}
> git diff
> src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
> @@ -109,7 +109,7 @@ public class KafkaRecordReader extends
> AbstractRecordReader {
> currentMessageCount = 0;
>
> try {
> - while (currentOffset < subScanSpec.getEndOffset() - 1 &&
> msgItr.hasNext()) {
> + while (currentOffset < subScanSpec.getEndOffset() && msgItr.hasNext())
> {
> ConsumerRecord<byte[], byte[]> consumerRecord = msgItr.next();
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)