Apache Pinot Daily Email Digest (2022-06-06)

Pinot Slack Email Digest Mon, 06 Jun 2022 19:59:56 -0700

#general

@nimrod.raifer: @nimrod.raifer has joined the channel
@vuppala.kumar: @vuppala.kumar has joined the channel
@vuppala.kumar: Hi Pinot Team, Do you know how to create, rename, drop column name of existing table. It is tedious task always to drop the existed table and to create new table for altering the column names.
@vuppala.kumar: @mayanks @kharekartik any idea on this?
@kharekartik: Hi, you edit the schema from the UI itself. You can add columns columns. Rename and Drop is not supported since the schema should be backward compatible.
@mayanks: Yes, schema evolution does not require deleting and recreating table.
@hamed.ahlesaadat: @hamed.ahlesaadat has joined the channel
@vuppala.kumar: Hi Pinot community, Can we copy data of one table to new table?
@mark.needham: You can, by copying the segments across. I wrote a blog showing a small example of how to do this -
@seshasendhil: @seshasendhil has joined the channel
@kaiqueras: @kaiqueras has joined the channel
@max: @max has joined the channel

#random

@nimrod.raifer: @nimrod.raifer has joined the channel
@vuppala.kumar: @vuppala.kumar has joined the channel
@hamed.ahlesaadat: @hamed.ahlesaadat has joined the channel
@seshasendhil: @seshasendhil has joined the channel
@kaiqueras: @kaiqueras has joined the channel
@max: @max has joined the channel

#troubleshooting

@nimrod.raifer: @nimrod.raifer has joined the channel
@vuppala.kumar: @vuppala.kumar has joined the channel
@alihaydar.atil: Hello everyone, Is there a reason why GroovyFunctionEvaluator returns null on bindings with null values? Would it cause any side effects to run the script with null bindings? Thanks in advance
@mark.needham: @npawar probably knows best about this.
@npawar: No particular reason. we were just being defensive. this has been changed in the recent master
@alihaydar.atil: Thanks for the answer :+1:
@hamed.ahlesaadat: @hamed.ahlesaadat has joined the channel
@tommaso.peresson: Hi everybody, I have a question for you. I have a table/schema configured like: ```{ "OFFLINE": { "tableName": "DailyUniqHll_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": { "timeType": "DAYS", "retentionTimeUnit": "DAYS", "retentionTimeValue": "365", "replication": "1", "timeColumnName": "partition", "allowNullTimeValue": false }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "enableDefaultStarTree": false, "starTreeIndexConfigs": [ { "dimensionsSplitOrder": [ "partition", "fields.1", "fields.2", "fields.3", "fields.4", "fields.5", "fields.6", "fields.7", "fields.8", "fields.9" ], "functionColumnPairs": [ "SUM__counters.c", "DISTINCTCOUNTHLL__hllState" ], "maxLeafRecords": 1000 } ], "enableDynamicStarTreeCreation": true, "aggregateMetrics": false, "nullHandlingEnabled": false, "rangeIndexVersion": 2, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false }, "metadata": {}, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY" }, "complexTypeConfig": { "fieldsToUnnest": [ "fields", "counters" ], "delimiter": ".", "collectionNotUnnestedToJson": "NON_PRIMITIVE" } }, "isDimTable": false } }``` Schema: ```{ "schemaName": "ViewElementDailyUniqHll", "dimensionFieldSpecs": [ { "name": "fields.1", "dataType": "STRING" }, { "name": "fields.2", "dataType": "STRING" }, { "name": "fields.3", "dataType": "STRING" }, { "name": "fields.4", "dataType": "STRING" }, { "name": "fields.5", "dataType": "STRING" }, { "name": "fields.6", "dataType": "STRING" }, { "name": "fields.7", "dataType": "STRING" }, { "name": "fields.8", "dataType": "STRING" }, { "name": "fields.9", "dataType": "STRING" }, { "name": "cubeName", "dataType": "STRING" }, { "name": "list", "dataType": "LONG", "singleValueField": false }, { "name": "hllState", "dataType": "BYTES" }, { "name": "counters.c", "dataType": "INT" } ], "dateTimeFieldSpecs": [ { "name": "partition", "dataType": "STRING", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ] }``` When I ingest some data I get a ~10x size increase because of `DISTINCTCOUNTHLL__hllState` in the star tree index. Is this expected? Is there something misconfigured?
@mark.needham: do you mean 10x more than if that field isn't included in the index?
@mayanks: What does hll_state contain? From your config, another HLL will be created where hll_state is the element in that set. Is that what you intend?
@tommaso.peresson: > do you mean 10x more than if that field isn't included in the index? yes
@tommaso.peresson: hll state contains a bytes array representing the pre-estimation state of the HLL algorithm.
@g.kishore: Which column is HLL representing?
@mayanks: It is the hll_state column which is serlaized HLL, from what I understand @g.kishore.
@mayanks: You do have split on several dimensions and a low max leaf record value, that may be contributing to some (if not all).
@g.kishore: The size increase basically means that one of the fields 1 to 9 have very high cardinality
@g.kishore: And there is not much aggregation happening when the star tree index is created
@luisfernandez: hey friends, it’s me again, this time around with a question around partitions, we have an offline job that uploads data to pinot that we are testing in our sandbox env to the offline tables, when that data is ingested and i look at the segments it generates i see the partitions being created like this: in the metadata from the ui ```{\"numPartitions\":8,\"partitions\":[0,1,2,3,4,5,6,7]``` however in our prod system, which has a hybrid setup I always see one number in the partitions column, ```{\"numPartitions\":8,\"partitions\":[1]``` is this something I should be concerned about?
@mayanks: Prod looks good, dev is not partitioned
@luisfernandez: does that mean that we have to partition the data better?
@luisfernandez: does this impact performance?
@mayanks: It means your data is not seen as partitioned by Pinot, it will help to fix that if you want thousands of read qps
@karthik.varagini: , facing some problem to load the data to an existing offline table data getting overwrite iam using the following command, If i tried to load the new data I'm loosing the old data any suggestions ``` sudo docker run --rm -ti \ --network=pinot-demo_default \ -v /home/XXXX/dna/pinot/lookup2/pinot-quick-start:/home/XXXX/dna/pinot/lookup2/pinot-quick-start \ --name pinot-batch-table-creation \ apachepinot/pinot:latest AddTable \ -schemaFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-schema.json \ -tableConfigFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-table-offline.json \ -controllerHost manual-pinot-controller \ -controllerPort 9000 -exec sudo docker run --rm -ti \ --network=pinot-demo_default \ -v /home/XXXX/dna/pinot/lookup/pinot-quick-start:/home/XXXX/dna/pinot/lookup/pinot-quick-start \ --name pinot-data-ingestion-job \ apachepinot/pinot:latest LaunchDataIngestionJob \ -jobSpecFile /home/XXXX/dna/pinot/lookup/pinot-quick-start/docker-job-spec.yml```
@luisfernandez: I wonder if this makes it override `overwriteOutput: true` in your docker-job-spect copy.yml
@karthik.varagini: sorry i have attached wrong file ... here is the actual one
@troy: @karthik.varagini friendly reminder, please do not use `@-here`. We want to be respectful of everyone in this community slack channel.
@karthik.varagini: ohk @troy
@mark.needham: I think the new data is likely creating a segment with the same name as the old data
@mark.needham: is there a timestamp as one of your columns?
@mark.needham: that's probably the easiest way to ensure a unique segment name
@karthik.varagini: thanks @mark.needham, yes, both the segments are creating with same name... I tried by proving the following details , It solved my problem... thanks a ton ```segmentNameGeneratorSpec: type: fixed configs: segment.name: 'orders1'```
@karthik.varagini: my docker spec
@mathieu.druart: Hello everyone, I have an offline Pinot table with a STRING multi valued column and when I try this request : ```select distinct myMultiValuedColumn from MyTable where otherColumn in ('MY_VALUE') limit 1000``` I have this error : ``` "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:418)\n\tat org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:89)\n\tat org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:109)", "errorCode": 200``` If I remove the distinct or the where clause, I have no issue. Am I missing something ? Thank you !
@seshasendhil: @seshasendhil has joined the channel
@kaiqueras: @kaiqueras has joined the channel
@max: @max has joined the channel

#getting-started

@nimrod.raifer: @nimrod.raifer has joined the channel
@vuppala.kumar: @vuppala.kumar has joined the channel
@hamed.ahlesaadat: @hamed.ahlesaadat has joined the channel
@seshasendhil: @seshasendhil has joined the channel
@kaiqueras: @kaiqueras has joined the channel
@max: @max has joined the channel

#introductions

@nimrod.raifer: @nimrod.raifer has joined the channel
@vuppala.kumar: @vuppala.kumar has joined the channel
@hamed.ahlesaadat: @hamed.ahlesaadat has joined the channel
@karinwolok1: :wave: Please help us welcome to all the new Pinot community members! :wine_glass: We're growing so fast!!! *Would love to know who you are, how you discovered Pinot,* :pinot: *and what brought you here!* :heart: @sandeep278 @abhiram.p @harshvardhanc @jkylling @iamtherealdarknight @nimrod.raifer @vuppala.kumar @hamed.ahlesaadat @sowmya.gowda @karangisreekanth @dave.deep @xiaoyzhu @jaimin @mehmet.tasan @jag959 @pj.kovanen @gunnar.enserro @hareesh.lakshminaraya @rafael.moreno @acching @dangngoctan2012 @priya.shivakumar @arnaud.zdziobeck @teehan @gaetanmorlet @matthew @tommaso.peresson @cesaro.angelo @ghita.saouir @m.ram3sh @sonam.dp42 @archetana @gstein @lukas @kevin.peng @csmithson @fb @yanghao @valdamarin.d @jorick @justin @s.himadri @ahmadreza @attaraas @karthik.challa @kartik.anand @karthik.varagini @pravin.bange1989 @joe.padamadan @jacob.branch @piercarlo.paltro @alex.gartner @madison.s204 @wadodkar @kevin.kamel @carolyn @hui @kingkenway16 @rsohlot @vishnu @visar @rino @adamkeane @dsipple @kearn.kirkwood @richard.bair @marc.kriguer @kozdemir @fritz.wijaya @jmoots @sderegt838 @horaymond6 @randika @zotyarex
@seshasendhil: @seshasendhil has joined the channel
@kaiqueras: @kaiqueras has joined the channel
@max: @max has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2022-06-06)

#general

#random

#troubleshooting

#getting-started

#introductions

Reply via email to