klsince opened a new issue, #11965:
URL: https://github.com/apache/pinot/issues/11965

   While debugging this issue: https://github.com/apache/pinot/issues/11948, we 
found that the server could start the new consuming segment before the broker 
put the new segment into routing table, so queries continued to access the 
older segments whose records were already invalidated by new data ingested by 
new consuming segment, so returned smaller values for count(*) queries.
   
   e.g. logs from a test setup with the RealtimeQuickStart but modified to 
create segments more frequently and send queries in a tight loop to expose the 
issue. Watch out the timelines:
   1. server started to consume data, so it started to invalidate records in 
older segments
   2. count(*) queries were routed to the server
   3. broker was told to update routing table for dips__0__5__20231107T2314Z, 
but too late. As the new records in new segment were not visible to queries 
yet, the count(*) queries returned smaller values than expected.
   ```
   ...
   
   2023/11/07 15:14:04.247 INFO 
[RealtimeSegmentDataManager_dips__0__5__20231107T2314Z] 
[dips__0__5__20231107T2314Z] Starting consumption loop start offset 5000, 
finalOffset null
   
   ...
   
   2023/11/07 15:14:04.366 INFO [ServerQueryLogger] [pqr-4] Processed 
requestId=72826337000000804,table=dips_REALTIME,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value
   
)=5/5/2/-1/0/0/0/0/0,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=0,resSerMs=0,totalTimeMs=1,minConsumingFreshnessMs=1699398843946,broker=Broker_100.117.31.22_8000,numDocsScanned=2000,scanInFilter=0,scanPostFilter=0,sched=FC
   FS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
   
   ...
   
   2023/11/07 15:14:04.483 INFO [PinotHelixResourceManager] 
[grizzly-http-server-24] Sent 1 segment refresh messages to brokers for 
segment: dips__0__5__20231107T2314Z of table: dips_REALTIME
   
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to