Apache Pinot Daily Email Digest (2020-10-14)

Pinot Slack Email Digest Wed, 14 Oct 2020 19:01:09 -0700

#general

@krishna: @krishna has joined the channel
@murat.migdisoglu: @murat.migdisoglu has joined the channel
@murat.migdisoglu: hello dear pinot community, I'm evaluating pinot in our POC and comparing to druid. One thing that I couldn't find in the doc is the rest api to trigger a batch ingestion. Is CLI the only way of submitting batch ingestion job?
@ssubrama: @murat.migdisoglu on the controller, there is a `POST` api: `:whateverport/segments`
@murat.migdisoglu: I have another issue now. During the realtime ingestion, after publishing the first segment with 50K rows, Pinot does not ingest anymore data. maybe it is not creating the new segment, Im not sure. Its an append type table("segmentPushType": "APPEND",) with ("segmentPushFrequency": "HOURLY") . Where might be the issue? I can't see any exception in any log file
@snlee: @murat.migdisoglu Can void avoid mentioning `here` from the next time? There are more than 600 people in this general channel :wink:
@npawar: did you check server logs? most likely the consuming segment is not able to finish completing the segment
@fx19880617: also can we move the discussion to channel <#C011C9JHN7R|troubleshooting> ?

#random

@krishna: @krishna has joined the channel
@murat.migdisoglu: @murat.migdisoglu has joined the channel

#troubleshooting

@murat.migdisoglu: @murat.migdisoglu has joined the channel
@murat.migdisoglu: I'm following up my thread related to the real time ingestion here..
@npawar: can you share your table config and schema?
@murat.migdisoglu:
@npawar: the issue might be the schema name. the table config has `"schemaName": "revenue",` whereas the schema has `"schemaName": "revenue_test_murat",`
@npawar: Does your pinot-server log show absolutely no warning/error/exception message?
@npawar: and which version of Pinot are you on? The newer versions should have blocked creating the table config with misssing schema
@murat.migdisoglu: I'll reverify your point with schema mismatch
@murat.migdisoglu: we're running 0.5.0
@murat.migdisoglu: ok I verified the schema @npawar the table's schema is revenue_test_murat
@murat.migdisoglu: I'm tailing the server log
@murat.migdisoglu: and it doesn't print anythin
@npawar: did you delete and recreate table config after fixing the schema name?
@murat.migdisoglu: I did and I can retry.. But if that was the issue, why would it ingest only the first 50000 rows? "segment.flush.threshold.size": "50000"
@npawar: because it’s not able to complete the segment. the consumer consumed 50k rows based on this config and is not able to move forward coz segment creation is failing
@murat.migdisoglu: batch process works by the way
@murat.migdisoglu: but its another table(offline) afaik
@npawar: can you share the whole controller and server log
@murat.migdisoglu: ```Could not build segment java.lang.NullPointerException: null at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.addColumnMetadataInfo(SegmentColumnarIndexCreator.java:535) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:489) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:399) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:240) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:223) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:127) ~[pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:742) [pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:693) [pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:604) [pinot-all-0.5.0-jar-with-dependencies.jar:0.5.0-d87bbc9032c6efe626eb5f9ef1db4de7aa067179] at java.lang.Thread.run(Thread.java:832) [?:?] Pr```
@murat.migdisoglu: I've found an error
@murat.migdisoglu: but after that, no matter how much data comes to kafka, it does not generate any error
@npawar: like i said before, the consumption is going to stop, if it cannot create the segment.
@npawar: this seems related:
@npawar: i dont know if this was included in the 0.5.0. Checking
@npawar: To unblock, you could try without `aggregateMetrics` , or build from source
@npawar: yup, that fix is not part of 0.5.0. Could you build from source?
@murat.migdisoglu: You're right It worked without aggregation.
@murat.migdisoglu: For the sake of poc i'll stop here. But im suprised to have a bug in such a fundemantal feature :(
@murat.migdisoglu: Thx a lot for your help
@npawar: i’m also surprised :slightly_smiling_face: this feature is being used in some places i believe. So my hunch is that it is the combination of `aggregateMetrics: true` + `columnMinMaxValueGeneratorMode: ALL` . I have a feeling it may work fine if you remove columnMinMaxValueGenerator. And fwiw, it has been fixed on master and will be available in the next release
@npawar: @mayanks this flag is used at LinkedIn right? How does it work inspite of this: ? Is there something specific in this table config that might be triggering this?
@mayanks: IIRC, this was a bug that got introduced and we hit the same problem at LinkedIn. I believe that #5862 fixed the issue
@mayanks: @npawar ^^

#docs

@krishna: @krishna has joined the channel

#jdbc-connector

@krishna: @krishna has joined the channel

#lp-pinot-poc

@andrew: hm, i don’t see that happening
@andrew: it’s all showing as REALTIME
@andrew: i thought it was configured to move completed segments to offline
@andrew: i also see that a lot of segments are in a “bad” state
@andrew:
@andrew: if i click one of the bad segments, there’s no further information explaining why
@andrew:
@g.kishore: its a minor UI bug - it shows bad when its getting converted from consuming to ONLINE
@g.kishore:
@fx19880617: If all the segments are still on realtime nodes, then there is some issue there
@fx19880617: from my benchmark, once segments are persisted, they will be moved to offline servers
@fx19880617: do you have controller log or we can do a zoom call to look at this

#roadmap

@krishna: @krishna has joined the channel

#metadata-push-api

@fx19880617:
@fx19880617: @steotia this is the user trying with metadata push
@fx19880617: they are trying to push 2k or 20k segments at once
@fx19880617: it’s still taking about 1+hour for them
@fx19880617: data volume is at couple TB level
@mayanks: @fx19880617 what is the deep store being used in this case?
@fx19880617: s3
@fx19880617: so we let controller skip the download part
@mayanks: Yeah, we want to skip that as well
@fx19880617: but the idealstats updater coming from all controllers will drag down the upload speed
@fx19880617: thinking of if we make parallelism to 100, then 100 threads trying to update idealstats at once
@mayanks: I think we may need to ensure there are no race conditions
@mayanks: The issue we have is that not every deepstore has an overwrite blob api
@fx19880617: that’s ok right ?
@fx19880617: if deepstore cannot override
@fx19880617: then you can make a daily directory and put segments with same segment names
@fx19880617: then push
@fx19880617: let the refresh code path updating the download path as well
@fx19880617: since it’s already updating crc/segment refresh time etc
@mayanks: Yeah, i think that is fine. Thing we are debating was how to cleanup old segment
@mayanks: Ideally, if upload path does the cleanup then it would be best
@fx19880617: then you can have a cleanup job in hadoop to do that
@steotia: Does S3 support overwrite?
@fx19880617: yes
@fx19880617: the override is user delete the directory then push
@fx19880617:
@fx19880617: s3 will just do the overwrite
@fx19880617: seems it also has versioning supported
@fx19880617: if your blob store has that support and you can somehow inherit the version in download uri, then it will be fantastic
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2020-10-14)

#general

#random

#troubleshooting

#docs

#jdbc-connector

#lp-pinot-poc

#roadmap

#metadata-push-api

Reply via email to