#general
@emailvidhi01: @emailvidhi01 has joined the channel
@rishi.shukla: @rishi.shukla has joined the channel
@karinwolok1: :partying_face: :partying_face: :partying_face: Apache Pinot :wine_glass: has officially hit 1 MILLION :tada: downloads! :partying_face: :partying_face: :partying_face:
@weixiang.sun: We are working on offline segment ingestion. Currently we are using the TarPush. But its problem is that the controller need get involved with the data path by downloading the segment. Just curious, how does metadata push prevent the controller getting involved with data path?
@ken: With metadata push, you give the controller the URI of where the segment is located. This is used to update Zookeeper state, and (if needed) will trigger a download by the server processes. Which is why, when doing metadata push, you need to have your “deep store” location for segments be a shared file system (S3, HDFS, etc) that all the servers can access.
@weixiang.sun: @ken Thanks
@weixiang.sun: @elon.azoulay ^^
@elon.azoulay: Yep, but in order to get the metadata SegmentPushUtils.sendSegmentUriAndMetadata downloads the segment, extracts the metadata and only uploads the metadata.
@elon.azoulay: From the code ```/** * This method takes a map of segment downloadURI to corresponding tar file path, and push those segments in metadata mode. * The steps are: * 1. Download segment from tar file path; * 2. Untar segment metadata and creation meta files from the tar file to a segment metadata directory; * 3. Tar this segment metadata directory into a tar file * 4. Generate a POST request with segmentDownloadURI in header to push tar file to Pinot controller. * * @param spec is the segment generation job spec * @param fileSystem is the PinotFs used to copy segment tar file * @param segmentUriToTarPathMap contains the map of segment DownloadURI to segment tar file path * @throws Exception */```
@elon.azoulay: Atleast in pinot 0.8.0 ^^^. Did it change in a newer version of pinot?
@weixiang.sun: Is SegmentPushUtils.sendSegmentUriAndMetadata called outside the controller?
@elon.azoulay: Looks like it's called from the ingestion jobs
@weixiang.sun: segment download here is not happening in the controller.
@elon.azoulay: Only place I see it called is from ingestion jobs
@elon.azoulay: But the upload segment call happens on the controller: PinotSegmentUploadDownloadRestletResource
@karinwolok1: :mega: *Kafka Summit London is looking for speakers!* :mega: Interested in speaking? You have until Dec 20 to submit, so send in your talk now!!! :partying_face:
@npawar: Thanks for the reminder @karinwolok1! This :point_up: is a really great opportunity folks :slightly_smiling_face: Many of you are doing really cool things with Pinot and Kafka, and this is the best platform to share your story about using these 2 systems together. Plus it’s in person at London this time! (though KS virtual experience is also extremely fun!)
@chris.jayakumar: Hello folks, what is the recommended system specs for each of the services required for a pinot cluster. Is there a formula to calculate this based on the size of the data?
@mayanks: Hello, depending on your data size and workload, you can go anywhere from 4-32 cores, 4-64GB for the serving nodes.
@chris.jayakumar: is that per service like controller, server etc? or you mean overall?
@mayanks: That was for pinot-server. For Controller - the parameter is total number of tables and segments across all tables. Typically 4-8 core will do the job. For broker. 4-16 cores, 4GB to 64GB depending on your workload
@chris.jayakumar: cool thanks for your help Mayank
#random
@emailvidhi01: @emailvidhi01 has joined the channel
@rishi.shukla: @rishi.shukla has joined the channel
#feat-presto-connector
@lrhadoop143: @lrhadoop143 has joined the channel
#troubleshooting
@lrhadoop143: Hi team ,facing issues while accessing pinto table in presto getting Query 20211214_044329_00009_tcxbr failed: java.net.SocketTimeoutException: Connect Timeout
@emailvidhi01: @emailvidhi01 has joined the channel
@rishi.shukla: @rishi.shukla has joined the channel
@lrhadoop143: Hi team, error while running join queries of pinot data from presto. ERROR:Query 20211214_102018_00035_4f5rn failed: null value in entry: Server_172.19.0.5_7000=null
@lrhadoop143: Log: ```java.lang.NullPointerException: null value in entry: Server_172.19.0.5_7000=null at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42) at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242) at com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214) at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161) at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:280) at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:245) at com.facebook.presto.operator.Driver.processInternal(Driver.java:424) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:307) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:728) at com.facebook.presto.operator.Driver.processFor(Driver.java:300) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599) at com.facebook.presto.$gen.Presto_0_267_SNAPSHOT_ac0dc73____20211214_100300_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)```
@mayanks: What versions of Pinot/Presto are you using?
#pinot-dev
@ilirsuloti: @ilirsuloti has joined the channel
#presto-pinot-connector
@lrhadoop143: Hi team ,facing issues while accessing pinto table in presto getting Query 20211214_044329_00009_tcxbr failed: java.net.SocketTimeoutException: Connect Timeout
@xiangfu0: have you tried with newest docker image ?
@xiangfu0: we saw some groupby queries with issue
@xiangfu0: if you can share more logs on your presto coordinator/worker, that will be very useful for debugging
@lrhadoop143: Hi Xiang ,Now I can connect to presto and able to do simple queries ,but while trying joins and sub quires query is failing with error: ```java.lang.NullPointerException: null value in entry: Server_172.19.0.5_7000=null at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42) at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242) at com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214) at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161) at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:280) at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:245) at com.facebook.presto.operator.Driver.processInternal(Driver.java:424) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:307) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:728) at com.facebook.presto.operator.Driver.processFor(Driver.java:300) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599) at com.facebook.presto.$gen.Presto_0_267_SNAPSHOT_ac0dc73____20211214_100300_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)```
@xiangfu0: can you explain the query ?
@xiangfu0: is presto and pinot deployed in same cluster?
@xiangfu0: for pinot side, can you try to add below into your pinot server configs: ```pinot.server.instance.currentDataTableVersion=2 pinot.server.grpc.enable=true pinot.server.grpc.port=8090```
#pinot-perf-tuning
@rohitdev.kulshrestha: @rohitdev.kulshrestha has joined the channel
#getting-started
@ilirsuloti: @ilirsuloti has joined the channel
@weixiang.sun: @weixiang.sun has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org