[ 
https://issues.apache.org/jira/browse/DRILL-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937758#comment-16937758
 ] 

daniel kelly commented on DRILL-7388:
-------------------------------------

Unfortunately due to the nature of the data I can't share that - but I can give 
an anonymised example:-

 
{code:java}
using kafka's console tool showed the contents of a a topic:-

bin/kafka-console-consumer.sh --bootstrap-server theserver:9092 --group test 
--from-beginning --topic myTopic
{“fld1”:1,”fld2”:”xxx”}
{“fld1”:2,”fld2”:”xxx”}
{“fld1”:3,”fld2”:”xxx”}
{“fld1”:4,”fld2”:”xxx”}
{“fld1”:5,”fld2”:”xxx”}
{“fld1”:6,”fld2”:”xxx”}


using drill before my fix :-

Apache Drill 1.16.0
"A little SQL for your NoSQL."
apache drill (kafka)> show tables ;


+--------------+------------------------+
| TABLE_SCHEMA |       TABLE_NAME       |
+--------------+------------------------+
| kafka        | myTopic                |
| kafka        | __consumer_offsets     |
+--------------+------------------------+



apache drill (kafka)> select fld1 from `myTopic`  ;
+--+
|  |
+--+
+--+
No rows selected (7.629 seconds)


After my fix:-


apache drill (kafka)> select fld1 from `myTopic`  ;


+------+
| fld1 |
+------+
| 1    |
| 2    |
| 3    |
| 4    |
| 5    |
| 6    |
+------+
6 rows selected (4.345 seconds)
{code}
 

By examining the drillbit.log output, I could see that the was a partition 
containing data  - [*topicName=myTopic, partitionId=117, startOffset=0, 
endOffset=1]*but results were not being collated. Checking the code seemed to 
suggest the problem was with the following line in KafkaRecordReader.java
while (currentOffset < subScanSpec.getEndOffset() - 1 && msgItr.hasNext()) {
which would never retrieve data associated with the single offset - i.e. with 
currentOffset  = 0  and endOffset = 1 - that condition will never match.

hope that helps illustrate the issue.

 

> Apache Drill Kafka Storage module fails to return results for partitions 
> containing single offset record
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7388
>                 URL: https://issues.apache.org/jira/browse/DRILL-7388
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: daniel kelly
>            Priority: Blocker
>
> If a partition only contains one record - e.g.
> [topicName=myTopic, partitionId=117, startOffset=0, endOffset=1]
> no data is returned.
> I fixed this locally with the following code change in contrib/storage-kafka 
> :-
> {code:java}
> git diff 
> src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
> @@ -109,7 +109,7 @@ public class KafkaRecordReader extends 
> AbstractRecordReader {
>      currentMessageCount = 0;
>  
>      try {
> -      while (currentOffset < subScanSpec.getEndOffset() - 1 && 
> msgItr.hasNext()) {
> +      while (currentOffset < subScanSpec.getEndOffset() && msgItr.hasNext()) 
> {
>          ConsumerRecord<byte[], byte[]> consumerRecord = msgItr.next();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to