Apache Pinot Daily Email Digest (2021-05-07)

Pinot Slack Email Digest Fri, 07 May 2021 19:00:54 -0700

#general

@k.santhiswarup: @k.santhiswarup has joined the channel
@pedro.cls93: Hello, Can field transformations be composed? I.e: `"fromDateTime(JSONPATHSTRING(result,'$.AudioLength','00:00:00.000'), 'HH:mm:ss.SSS')"` where the transformed field is of type `Long`?
@npawar: yes, nesting should be supported
@tony: @tony has joined the channel
@grace.walkuski: Hello! I am wondering, if no fields in a query are aggregated, then is there an advantage to using `distinct` over grouping by all the fields? For example, is there a difference in efficiency between these two? `select distinct species, name from dataSource` `select species, name from dataSource group by species, name`
@grace.walkuski: they look to be comparable when I run each in the query console
@mayanks: I think the second one might be rewritten as the first one internally
@mayanks: @grace.walkuski ^^
@grace.walkuski: Ok gotcha. so using `distinct` is more efficient internally? but from our perspective it doesn’t matter.
@mayanks: I think internally the group-by is only for aggregation queries, so there's nothing to compare against.
@grace.walkuski: Alright, thanks!
@g.kishore: please use distinct for now, we have some cool enhancements coming that will be applicable to distinct
@grace.walkuski: Ok great!
@grace.walkuski: @yash.agarwal ^^ I’ll leave it as distinct
@darkyin87: @darkyin87 has joined the channel
@sleepythread: Segment Loading Question: Currently I am loading data into Pinot via Spark job with following config: ```executionFrameworkSpec: name: 'spark' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner' extraConfigs: stagingDir: '' jobType: SegmentCreationAndTarPush inputDirURI: '' outputDirURI: ''``` Now this generates the segment on pinot/output/Event1/dateid=2020-12-30/ I have Pinot deepstorage on HDFS where controller data ```/hp/pinot/data/controller/Event1/``` Currently AFAIU, The data is moved from HDFS => Pinot Controller => HDFS. Is there a way to short circuit the whole network process ? I can see there is configuration in Table where we can specify batchIngestionConfig=>segmentIngestionType as REFESH. Though, there is no example anywhere, do we have any test in codebase or some blog/docs e.t.c
@mayanks: Yes absolutely. You can just push the uri to controller
@mayanks: Let me find the doc
@mayanks:
@sleepythread: Currently, when i am uploading a lots of segment into Pinot, the table status moved to in BAD State for long period of time. Is this expected, or i have misconfigured the system ?
@mayanks: What do you see in the ideal state and external view in the ZK?
@sleepythread:
@sleepythread: Seems like ZK is not able to sync well.
@mayanks: Hmm seems like gzipped
@mayanks: IIRC there was a way to tell ZooInspector to unzip, will need to recollect the details
@g.kishore: this means there are too many segments
@g.kishore: which is fine but I am guessing you have too many small segments
@mayanks: correct
@g.kishore: Pinot UI will be able to decompress this
@sleepythread: Current segment size is around 80MB. What is the sweet spot for Pinot ?
@g.kishore: 100 to 500mb
@g.kishore: how many segments do you have, 10k+ ?
@jackie.jxt: 2544 based on the screenshot, so not too much
@mayanks: I think the original problem was BAD state
@sleepythread: For a day of data, i have currently 2000*80 MB partitions.
@mayanks: Which may mean that server running out of memory?
@mayanks: How many servers do you have and what's the replication?
@sleepythread: Pinot Server or ZK ?
@sleepythread: 3 Server 1 replication.
@mayanks: Pinot server (if too much metadata in memory). Or servers don't have enough local storage
@mayanks: what's the vm config
@mayanks: are you running latest master or official release?
@sleepythread: 23 core 128G memory 1 TB disk.
@sleepythread: I will add more memory to Pinot Server.
@mayanks: latest master or official release?
@sleepythread: 0.7.1
@mayanks: oh ok
@mayanks: in latest release there's a debug endpoint to get some info (so that we don't have to ask a lot of questions)
@sleepythread: Considering 500MB partitions with 90 days of data will have 28k Segment roughly.
@sleepythread: This will be interesting with ZK.
@mayanks: xms/xmx on server?
@sleepythread: ```-Xms12G -Xmx28G ```
@sleepythread: This is Server config.
@mayanks: Yeah, check server logs for errors?
@mayanks: Servers store segment metadata on heap, with large number of segments may be the heap size is not sufficient (just guessing)
@sleepythread: I will take a look at the gc.log
@sleepythread: and report back.
@mayanks: Look at the server log first
@mayanks: I have seen use cases go upto 1-1.5GB per segment without issues, just fyi. There are other factors to consider though before increasing the segment size to > 1 GB
@sleepythread: No errors in Server.log.
@sleepythread: Interestingly the error was only there when metadata push was happening.
@sleepythread: Once, it got completed. Table was back in Bad state.
@mayanks: I am confused, so is the table in BAD state after everything is done, or only during push?
@sleepythread: only when SparkSegmentMetadataPushJobRunner.java task was running.
@mayanks: Ok, so now table is GOOD?
@sleepythread: Once, its completed Table is now in good state
@sleepythread: yes.
@sleepythread: ZK is a vanilla installation also. 0 config running on same machine.
@mayanks: Ok, I need to check how UI reports BAD state. I usually go by ZK
@sleepythread: I will also configure this.
@mayanks: My guess is that it takes a while for external view to update since the server has to download segment from deepstore and that's when it is reported BAD by UI. If so, that should be fixed.
@sleepythread: I have many new batch to load. I will check and let you know the details.
@mayanks: Ok
@aiyer: Hi Team -- is there a way to run rank function on top of the data in pinot ?
@mayanks: Not natively in Pinot, may be presto?
@aiyer: ok i will try that...
@mayanks: We do have support for adding post aggregation functions, so should be a be able to add in Pinot I think
@aiyer: ok..
@aiyer: i am thinking how to model a use case where every row might have multiple versions pushed out from kafka over a period of time and If I only wanted the latest version to get pulled in the aggregation , how t get that...
@mayanks: Use upsert instead?
@aiyer: yeah i was thinking about it.. but was not sure if its recommended .. Have you guys tried it at prod scale?
@mayanks: Yes Uber is using it in production
@aiyer: ok.. Thats really good to know.. I will model something and test it this week and reach out if I get into any trouble :slightly_smiling_face: Thanks as always.

#random

@k.santhiswarup: @k.santhiswarup has joined the channel
@tony: @tony has joined the channel
@darkyin87: @darkyin87 has joined the channel

#troubleshooting

@k.santhiswarup: @k.santhiswarup has joined the channel
@aiyer: Hi Team - I am trying to create a Hybrid table. I created the offline and _realtime versions of the table. I had the kafka stream loaded with a few events so as soon as I created the tables, the realtime data showed up on the query console. After that I uploaded a CSV with some data using the cli tool LaunchDataIngestionJob . It was successful. But when I tried to query the table again, it didn't show the data that i loaded through the batch ingestion. But when I query the OFFLINE table separately it shows up. Not sure what could be wrong. Could you please guide ?? Attaching some screenshots - So the usecase would be, I have historical data in my warehouse and new data in a stream. I want the Hybrid table so that I can bootstrap using the batch job and then start consuming data in the stream for future data..
@g.kishore: what is the time column. is the data uploaded to offline table older than the real-time table
@aiyer: yeah..
@aiyer: real time ts : Tue May 18 2021 00:00:00
@aiyer: csv ts : Mon May 17 2021 00:00:00
@aiyer: since kafka had events, the data got loaded to the RT table first
@aiyer: then I ran the offline load job
@aiyer: Discussed with Kishore in separate chat.. this is working as expected but since there should be at least 1 day overlap between the batch data and real time data, May 17th data will be searched for in the realtime table. This boundary condition is explained in this doc - . I uploaded May 16th data via batch upload and verified that the data shows up and May17th data in kafka also shows up as part of query from realtime table.
@pedro.cls93: Hello, Is there a way to configure transformations on json fields to extract a certain property, where this property may not always appear? I've configured the following: ``` { "columnName": "audioLength", "transformFunction": "JSONPATH(result,'$.audioLength')" }``` But the result field, configured as: ```{ "name": "result", "dataType": "STRING" }``` Most of the time will have the following content: ```{"metadata":{"isMatch":"Y"}}``` But sometimes it can be something like: ```{ "AudioCreated": "2021-05-06T23: 40: 28.6629486", "AudioLength": "00: 04: 02.1800000", "BlobPath": "068fd3f0-e5d6-499a-bfb0-94491499aba6/9db5efb9-4a72-44ae-a570-8647e1ac896a/33d3c59d-b8e1-4818-be60-124e637fb02b.wav", "isValid": true, "feedback": "", "otherFeedback": "", "result": 1, "crowdMemberId": "90c97d94-91c3-4587-8c91-26f6e971d52c", "tags": null, "scriptToExecute": null }```
@pedro.cls93: Currently, the way the transformation is set up it looks like `JSONPATH` always assumes the field exists if the incoming json string field (in this case `result` ) is not null. Logs from the server instance: ```java.lang.IllegalStateException: Caught exception while invoking method: public static java.lang.Object org.apache.pinot.common.function.scalar.JsonFunctions.jsonPath(java.lang.Object,java.lang.String) with arguments: [{"metadata":{"isMatch":"Y"}}, $.audioLength] at org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:148) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.function.InbuiltFunctionEvaluator$FunctionExecutionNode.execute(InbuiltFunctionEvaluator.java:137) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.function.InbuiltFunctionEvaluator.evaluate(InbuiltFunctionEvaluator.java:104) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.recordtransformer.ExpressionTransformer.transform(ExpressionTransformer.java:95) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:82) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:491) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:402) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:538) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282] at org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:145) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed] ... 8 more Caused by: com.jayway.jsonpath.PathNotFoundException: No results for path: $['audioLength'] at com.jayway.jsonpath.internal.path.EvaluationContextImpl.getValue(EvaluationContextImpl.java:133) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa^C```
@jackie.jxt: Hi Pedro. For this use case, you can use `JsonPathString` with 3 arguments, where the third argument is the default value when the field does not exist
@avasudevan: I do see the schema is created thru the UI but the table is not created…
@tony: @tony has joined the channel
@avasudevan: Pushed a documentation PR see if this makes sense -
@aiyer: Discussed with Kishore in separate chat.. this is working as expected but since there should be at least 1 day overlap between the batch data and real time data, May 17th data will be searched for in the realtime table. This boundary condition is explained in this doc - . I uploaded May 16th data via batch upload and verified that the data shows up and May17th data in kafka also shows up as part of query from realtime table.
@darkyin87: @darkyin87 has joined the channel
@aaron: I'm getting a lot of errors when trying to run a SegmentCreation job on some Parquet files that were written out by Trino 354. I'll put the errors in a thread. Any ideas?
@aaron: ```java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/tmp/pinot-7dd1e9e9-b1bd-416c-ab 4b-1e66a887d7ca/input/20210507_201917_26196_rv8nu_14b41b59-66e3-4f97-9df0-56c76d859102; isDirectory=false; length=3804; replication=1; blocksize=33554432; modification_time=1620419283015; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694] ```
@aaron: ...
@aaron: ```Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT32, e ncodings:[BIT_PACKED, PLAIN_DICTIONARY, RLE], path_in_schema:[date], codec:null, num_values:2, total_uncompressed_size:54, total_compressed_size:72, data_page_offset:4 , statistics:Statistics(max:7F 62 34 01, min:7F 62 34 01, null_count:0), encoding_stats:[PageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN_DICTIONARY, count: 1), PageEncodingStats(page_type:DATA_PAGE, encoding:PLAIN_DICTIONARY, count:1)]) ```
@mayanks: Seems issue in parquet reader (outside Pinot)
@mayanks: Quick google search gives:
@mayanks: Does this look familiar ^^
@aaron: Hmm, that does look like the same error message. But this github issue is for parquetjs writing files that can't be read by other clients
@mayanks: Perhaps the same underlying root cause? Point being my guess is the issue is completely outside of Pinot
@aaron: Ok
@aaron: I can also read this parquet file using pandas/pyarrow
@aaron: I understand this is library code outside Pinot, but Pinot is supposed to be able to read parquet files, right?
@mayanks: Yeah, that's true
@mayanks: Pinot definitely should either read it correctly or fail gracefully if it cannot.
@mayanks: I was just giving pointers to debug
@mayanks: Do you have sample parquet file that we can debug from IDE?
@aaron: I'm so sorry, due to IP concerns I can't share the file directly but if I can come up with a minimal repro I will
@aaron: Is parquet the preferred format for getting data into Pinot or is there an easier format I should try?
@mayanks: There's no preferred format, all of them just implement a RecordReader interface. If you want to just get around and move fast, you can try any other format avro/json/orc/..
@mayanks: But if you can indeed repro using a dummy input, we can help debug
@mayanks: Alternatively you can just debug your IP file. You should be able to run the same code from within IDE with minimal effort.
@aaron: Ok, from what I can tell, the issue is that Trino defaults to using ZSTD compression and Pinot seems not to like that
@aaron: If I compress using GZIP it works
@mayanks: I see. Could you please file an issue? Pinot should auto detect this perhaps
@aaron: Sure thing
@mayanks: :thankyou:
@aaron:
@mayanks: That was quick, end-to-end :clap:
@fx19880617: are you using this `org.apache.pinot.plugin.inputformat.parquet.ParquetAvroRecordReader` ?
@fx19880617: can you try this : `org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader`
@aaron: I've been using `org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader` (I think that's what the docs said to do)
@fx19880617: I think it’s the default to use AvroParquet
@fx19880617: you can add a config to switch to NativeParquetReader

#pinot-dev

@richhickey: @richhickey has joined the channel

#pinot-perf-tuning

@tony: @tony has joined the channel

#minion-improvements

@laxman: Have a quick question? Which table (OFFLINE or ONLINE) will be picked up when there is time overlap of data in both REALTIME and OFFLINE?
@g.kishore: offline
@laxman: As per my observation, it looks random. For some tables, it not picking up data from OFFLINE. I’m trying following query.
@laxman:
@laxman: ```select $segmentName, count(*) from spanEventView group by $segmentName order by 1 desc limit 1000```
@laxman: For some tables, this fetches data from offline tables but for some its just fetching from REALTIME only
@laxman: As you can see in above screenshot, numSegmentsQueried (738 = 714 REALTIME + 24 OFFLINE)
@laxman: numSegmentsProcessed - Fetched only the REALTIME segments
@g.kishore: the way it works is
@g.kishore: we get the max(timestamp) from table_offline
@g.kishore: then do two queries
@laxman: exactly I tried that just now
@laxman: ```select $segmentName, span_id from spanEventView where start_time_millis < 1619996400082 order by 1 limit 1000```
@laxman: `1619996400082` - This is the max timestamp of OFFLINE table
@laxman: Now I can see results from OFFLINE.
@laxman: Thanks so much Kishore for clarifying. Was confused with the results of my earlier query.
@g.kishore:
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org