Apache Pinot Daily Email Digest (2021-05-20)

Pinot Slack Email Digest Thu, 20 May 2021 19:00:30 -0700

#general

@rraguram: @rraguram has joined the channel
@neilteng233: Hey, do we have any example querying/working on the multi-value column? I cannot find one in the document. e.g. get the first element in the multi-val column.
@mayanks: How do you want to use it? Note that it not necessarily be ordered.
@neilteng233: I see. I have a column with a list of addresses of the customer, where each one is a json.
@mayanks: @jackie.jxt do we preserve MV column order? Rather should client rely on ordering?
@jackie.jxt: We do preserver the order in MV column
@jackie.jxt: Currently we don't have a function to pick the first element within a MV column, but it should be easy to add or plug in
@neilteng233: The ordering is not important. I want to know how to fetch the first or the second element in the MV column. Do we have a pinot function to do that?
@jackie.jxt: In `ArrayFunctions` we have several functions that applies to array. You may plug-in your own function with `ScalarFunction` annotation
@neilteng233: Thanks, I will check that out. One more question, do you happen to know the behavior of trino/prestodb when it query a MV column? Any function I should applied on the result to be recognized as a array in trino?
@jackie.jxt: @fx19880617 ^^
@npawar: You could use a groovy function to fetch a particular element from the array
@mayanks: I think the ask is for fetching it during query time.
@npawar: we have groovy for query time too
@fx19880617: Prestodb should treat mv column as an array
@npawar: several examples of getting elements from MV columns using groovy:
@mayanks: Thanks @npawar
@karinwolok1: :wine_glass: New to Apache Pinot and want to understand the basics? Join us for Intro to Apache Pinot meetup today! 10am PDT | 1pm EDT :slightly_smiling_face:
@allison: @allison has joined the channel
@jcwillia: @jcwillia has joined the channel
@sgarud: @sgarud has joined the channel
@vbondugula: @vbondugula has joined the channel
@kelvin: Hi, i'm streaming from Kafka and would like to have a way to uniquely identify messages. In Kafka consumer, I can do that with offset. Is it possible to expose Kafka metadata such as offset/timestamp to Pinot clients?
@mayanks: I suppose you could write a transform function that reads that metadata and populates columns in Pinot schema.
@fx19880617: actually I found it’s a good ask which can be part as a hidden column for msg consumed from kafka
@mapshen: @mayanks is there an example for reading that metadata? Not sure what the available fields are.
@g.kishore: This is an amazing idea and easy to add as part of Kafka decoder
@g.kishore: @mayanks this is not available as part of the data which means transform function cannot be do this. What we need is Kafka decoder to read this metadata and add it to generic row.. we already have access to this and use it for checkpointing.. should be easy to add this.. very good beginner task
@mapshen: @mapshen has joined the channel

#random

@rraguram: @rraguram has joined the channel
@allison: @allison has joined the channel
@jcwillia: @jcwillia has joined the channel
@sgarud: @sgarud has joined the channel
@vbondugula: @vbondugula has joined the channel
@mapshen: @mapshen has joined the channel

#feat-presto-connector

@nadeemsadim: @nadeemsadim has joined the channel

#troubleshooting

@rraguram: @rraguram has joined the channel
@allison: @allison has joined the channel
@jcwillia: @jcwillia has joined the channel
@jmeyer: Hello :wave: What's the meaning of `{'errorCode': 410, 'message': 'BrokerResourceMissingError'}` ? Got it from `pinot-db` Python client The query runs fine from the UI though
@mayanks: This means that broker for table was not found. One common cause I have seen in the past is incorrect table name in the query.
@jmeyer: Thanks @mayanks, I'll check that !
@mayanks: I added a bit more details in the FAQ:
@jmeyer: Perfect
@fx19880617: Just curious, which python client are you using?
@jmeyer: Official pinot-db
@jmeyer: Is there any other recommended Python client ?
@fx19880617: no I think that’s good
@jmeyer: :ok_hand:
@sgarud: @sgarud has joined the channel
@vbondugula: @vbondugula has joined the channel
@mike.davis: Hello, are transform configs supported when generating OFFLINE segments? I'm trying to add a new column via a date transformation and getting: ```Caught exception while gathering stats org.apache.parquet.io.InvalidRecordException: NEW_FIELD_NAME not found in message schema {``` ingestionConfig: ``` "ingestionConfig": { "transformConfigs": [ { "columnName": "NEW_FIELD_NAME", "transformFunction": "fromEpochDays(OLD_FIELD_NAME)" } ] },```
@npawar: yes it is supported.
@npawar: the exception looks like it’s coming from parquet? can you share the whole stack trace?
@mike.davis: yeah I thought it might be a parquet issue: ```Caught exception while gathering stats org.apache.parquet.io.InvalidRecordException: NEW_FIELD_NAME not found in message schema { <...schema omitted...> } at org.apache.parquet.schema.GroupType.getFieldIndex(GroupType.java:175) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordExtractor.extract(ParquetNativeRecordExtractor.java:117) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.next(ParquetNativeRecordReader.java:106) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader.next(ParquetRecordReader.java:64) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:67) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:42) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:172) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:247) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] Exception caught: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Caught exception while generating segment from file: /data/data_019c5bcb-0401-e7fc-0019-bd01cc97e583_906_6_0.snappy.parquet at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_292] at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_292] at org.apache.pinot.tools.admin.command.CreateSegmentCommand.execute(CreateSegmentCommand.java:274) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] Caused by: java.lang.RuntimeException: Caught exception while generating segment from file: /data/data_019c5bcb-0401-e7fc-0019-bd01cc97e583_906_6_0.snappy.parquet at org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:265) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]```
@mike.davis: specifically I'm using the `ParquetNativeRecordExtractor`
@mike.davis: I can optionally switch to using plain Avro (non-parquet) if for some reason Native Parquet is lacking some functionality.
@mike.davis: FWIW the original source is a Snowflake table so I'm exporting into Parquet purely for ingestion into Pinot so the format is somewhat arbitrary.
@npawar: and what’s the pinot schema?
@mike.davis: The new field was part of the Pinot schema as a datetime field: ``` { "name": "NEW_FIELD_NAME", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:DAYS" },```
@mike.davis: I can dig more into on my end, good to know that support is there, but maybe there's an issue with the parquet reader.
@mapshen: @mapshen has joined the channel

#pinot-dev

@nadeemsadim: @nadeemsadim has joined the channel
@mapshen: @mapshen has joined the channel

#community

@mapshen: @mapshen has joined the channel

#getting-started

@lochanie1987: @lochanie1987 has joined the channel
@nadeemsadim: @nadeemsadim has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]