Apache Pinot Daily Email Digest (2021-05-12)

Pinot Slack Email Digest Wed, 12 May 2021 19:00:37 -0700

#general

@aiyer: Hi Team -- How do you recommend to handle cases where we need to delete a record due to gdpr/ccpa ?
@jackie.jxt: This can be done by the minion `PurgeTask`
@jackie.jxt: Since the record purge is custom logic, you need to implement and plug in the task scheduler and the record purger
@jackie.jxt: See `SegmentPurgerTest` and `SimpleMinionClusterIntegrationTest` for examples
@aiyer: ok.. I will take a look. Thanks Jackie.
@aiyer: Another question -- • How does presto push down aggregations when we do joins ? If it doesn't push down and it fetches the row to do the agg, then it will be slow. My use case is a typical star schema (a fact table and multiple dimension tables). The slice/dice will generally occur on top of these dimensions for which i will need join capabilities. (sorry getting questions as I am doing some tests and thinking about my usecase)
@aiyer: I would expect the aggr for the fact table to happen on pinot and then only the mapping of the ids to the values from dim table happen in presto.. Let me know if its not clear and i will post an example.
@fx19880617: Presto will parse query and generate the plan, if it’s join then aggregate, presto will fetch data from both sides, only predicate push down.
@fx19880617: If it’s one table agg then join, it should be simple agg pushdown
@fx19880617: You can also do explain on your query to see the presto generated plan and see what’s generated PinotQuery
@aiyer: ok let me try this ! Thanks Xiang Fu
@aiyer: ```presto:default> explain select sum(amount) from txn; -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Output[_col0] => [sum:double] Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?} _col0 := sum - RemoteStreamingExchange[GATHER] => [sum:double] Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?} - TableScan[TableHandle {connectorId='pinot_quickstart', connectorHandle='PinotTableHandle{connectorId=pinot_quickstart, schemaName=default, tableName=txn, isQue Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00} sum := PinotColumnHandle{columnName=sum, dataType=double, type=DERIVED} ```
@aiyer: hi @fx19880617 -- What does this mean ?
@aiyer: did this push down the agg?
@fx19880617: you need to use the right arrow to show the full plan
@aiyer: yeah so i tried these two queries -- ``` select txName,sa from (select txtype,sum(amount) sa from txn group by txtype) a join txtypes b on a.txtype=b.txtype; select txName,sum(a.amount) from txn a join txtypes b on a.txtype=b.txtype group by txName;```
@aiyer: the first one did the push down (Pinot query had the group by logic)
@aiyer: the second one did not have the group by in PinotQuery..
@aiyer: Does this behavior look accurate to you ?
@fx19880617: i think so
@aiyer: cool.. I understand now.
@aiyer: thank you !
@fx19880617: this means presto parsed thee query into two plans
@fx19880617: join later and join first
@humengyuk18: What are the limitations when using noDictionaryColumns? I got the following exceptions when doing an orderby on a noDictionaryColumn: ```[ { "errorCode": 200, "message": "QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n\tat java.nio.Buffer.checkBounds(Buffer.java:571)\n\tat java.nio.DirectByteBuffer.get(DirectByteBuffer.java:264)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:80)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:60)\n\tat org.apache.pinot.core.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:34)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:465)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.createFetcher(RowBasedBlockValueFetcher.java:64)\n\tat org.apache.pinot.core.common.RowBasedBlockValueFetcher.<init>(RowBasedBlockValueFetcher.java:32)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.computePartiallyOrdered(SelectionOrderByOperator.java:237)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:178)\n\tat org.apache.pinot.core.operator.query.SelectionOrderByOperator.getNextBlock(SelectionOrderByOperator.java:73)" } ]```
@mayanks: Hmm, from query perspective everything should work. My guess is offset had overflow. but I thought we already switched to long based offset. Can you provide more context?
@mayanks: Cc @jackie.jxt @steotia
@jackie.jxt: We are using `int` to store offset within the chunk, but in normal case that should not overflow
@jackie.jxt: @humengyuk18 Can you share the segment metadata? What is the longest entry for this column? Does it contain special characters?
@ricardo.bernardino: Hi everyone! When using the realtime table with upsert, is there any compaction mechanism on segments? Or will they just keep on being created and kept forever? Thanks!
@mayanks: Good question, we have discussed it and will likely decide to do it, no concrete plan immediately that I am aware of though.
@ricardo.bernardino: Thanks for the reply! I was under the impression that on some design document it was mentioned that there would be some background task that would purge the segments of stale entries
@mayanks: Yes, currently that purge job is there for GDPR, and uses the Pinot Minion framework. We need to create on that purges for stale entries
@mohitdubey95: @mohitdubey95 has joined the channel
@patidar.rahul8392: Is there any way to generate schema JSON file for pinot table of JSON sample data.I have data for 250+ column in Kafka topic and here manulaay I am writing JSON schema file for pinot table. Kindly suggest me if is there any way to generate directly from sample data and can use same as schema file for pinot.
@mayanks: There's one for avro:
@mayanks: Perhaps you can look at the code and see if you can contribute one for JSON?
@aiyer: Question -- Is there any limit to the number of tenants we can have on a single cluster ? Eg - is 5000 tenants too much ?
@mayanks: Depends on the cluster size. Also, why do you need 5000 tenants?
@aiyer: i was thinking that way purging would be simpler, once the tenant is gone, we can simply purge the related tables and segments..
@aiyer: also to have isolation..
@mayanks: So you will have 5000 tables?
@mayanks: And I didn't get the purging part, how is that simple. You could just delete the table?
@aiyer: yes we could delete the table for the tenant..
@xiong.juliette: @xiong.juliette has joined the channel
@kmvb.tau: For most of Time Series /Audit data, Time Criteria is the basic one. (E.g) For one-year data, segments created on daily basis will have 365 segments per year. Even for queries that access only last month, last week data will be scheduled to scan all segments including unnecessary ones. is it possible to maintain min/max values of the primary time column in table Meta ?. maintaining time column meta will help broker side segment pruning similar to partition.
@mayanks: Pinot already does that and prunes segments based on min-max time stamp in the segment metadata.
@kmvb.tau: so query which accesses last week data(7 segments) will be scheduled to scan only 7 segments ?. Does segment pruning happen at the broker level itself or at server level?
@mayanks: We have some pruning that happens at broker and other server level
@mayanks: Yes, only 7 days of segment will be processed. also Pinot has sorted and inv index that can be used to further avoid scanning all data inside these 7 segments
@kmvb.tau: 1. Based on my understanding from documentation, partitioning helps segment pruning at the broker level itself. 2. For last week's data query, all 365 segments will be scheduled in the broker and only 7 segments will be processed in the server remaining segments will be pruned in the server based on segment metadata. 3. My suggestions is to handle main time-column criteria similar to partition column criteria. i.e pruning ar broker level to avoid unnecessary scheduling to avoid cpu wastage.
@kmvb.tau: please let me know if my understanding is wrong
@mayanks: Yes we have optimized these based on real production use cases. There is always a balance, eye broker needs to read metadata from zk, or cache it, so that is the overhead. but these are optimizations we consider at thousands of qps and millisecond latency. Is your usecase in that range? If not then you might be over optimizing?
@kmvb.tau: ok fine. For now, we expect 500 qps only and with sub 100 ms latency. we will test and let you know if any issue due to overscheduling.
@mayanks: Yeah, server level pruning + partitioning + sorting + inv index + replica group will give you much better than that.
@aaron: If data ingestion jobs take a lot of memory to create a star tree index, how can I tune that? Does maxLeafRecords affect the memory usage of the segment creation job at all?
@jackie.jxt: Yes, but that also affects the performance gain from the star-tree
@aaron: Do I need to tune maxLeafRecords based on the size of my dataset or is the default of 10000 a sane value?
@aaron: I'm asking because I can't get SegmentCreation jobs to run without an incredible amount of GC overhead, so I'm wondering if I'm doing something wrong
@jackie.jxt: 10k is usually good
@jackie.jxt: Do you know how many dimensions are included in the star-tree?
@aaron: My dimensionsSplitOrder has 7 items, and I've got like ~20 functionColumnPairs
@jackie.jxt: I see. 7 dimensions are not much. Any high cardinality ones?
@jackie.jxt: If you observe lots of GC, increasing the memory limit might help
@aaron: I don't think anything is super high cardinality; one of them could have maybe a few tens of k of values though
@aaron: By increasing the memory limit you mean the java heap size a la -Xms and -Xmx?
@aaron: I'm currently running with `-Xms32G -Xmx32G`
@aaron: And I'm also limiting the segment generation parallelism to 4
@jackie.jxt: Hmm, that's already quite high
@aaron: I have verbose GC logging on and I see a lot of this: ```2021-05-12T19:10:45.404+0000: [Full GC (Ergonomics) [PSYoungGen: 5921280K->5921277K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)] 27268648K->27268646K(30922752K), [Metaspace: 55657K->55657K(59392K)], 55.0908616 secs] [Times: user=1220.86 sys=14.76, real=55.08 secs] 2021-05-12T19:11:40.497+0000: [Full GC (Ergonomics) [PSYoungGen: 5921280K->5921277K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)] 27268648K->27268646K(30922752K), [Metaspace: 55657K->55657K(59392K)], 52.7552240 secs] [Times: user=1260.30 sys=13.89, real=52.75 secs] 2021-05-12T19:12:33.252+0000: [Full GC (Ergonomics) [PSYoungGen: 5921280K->5921279K(8552960K)] [ParOldGen: 21347368K->21347368K(22369792K)] 27268648K->27268648K(30922752K), [Metaspace: 55657K->55657K(59392K)], 47.7370731 secs] [Times: user=1237.77 sys=9.23, real=47.74 secs] ```
@jackie.jxt: Are you using the on-heap or off-heap mode?
@aaron: Not sure :grimacing: What is that and how can I find out?
@jackie.jxt: Do you use the spark job to create the segment?
@aaron: No, I'm running it via the docker image
@jackie.jxt: Oh, with the minion task?
@jackie.jxt: In that case it is off-heap
@jackie.jxt: Can you try further reducing the parallelism and see if the GC becomes better?
@aaron: Not with minion either; I'm just running this on the command line
@jackie.jxt: I see. Then maybe just reduce the parallelism and see if the GC goes down
@aaron: Is there such thing as a too-big segment creation job?
@jackie.jxt: What's the size of your input file and the output segment?
@aaron: The input is about 80 parquet files, 16 GB in total
@aaron: Not sure how big the output segment is because it's never succeeded :open_mouth:
@jackie.jxt: In that case, can you start with single threaded?
@jackie.jxt: 200MB per file in average is not too large
@aaron: Ok so I looked into this a little more -- the cardinality of my dimensions all together is 60,000,000
@aaron: Like, if I multiply the cardinality of each dimension
@aaron: Is that ridiculous?
@jackie.jxt: Not too ridiculous, but chances are the star-tree won't get much compression after removing the dimension
@jackie.jxt: If you can get one segment generated, we can check the segment metadata and see how many extra records generated for star-tree
@aaron: Ok cool
@aaron: Btw I realized I had `enableDefaultStarTree` enabled so it was also building one across all dimensions, so I set that to false
@sleepythread: Need some feedback on the star tree index. ```"tableIndexConfig" : { "starTreeIndexConfigs":[{ "maxLeafRecords": 1000, "functionColumnPairs": ["DISTINCT_COUNT_HLL__user_id","COUNT__dt"], "dimensionsSplitOrder": ["dt","dim1","dim2","dim3","dim4"] }], "enableDynamicStarTreeCreation" : true },``` This is to optimise following queries. ```select dt,DISTINCT_COUNT_HLL(user_id) FROM TABLE GROUP BY dt select dt,count(1) FROM TABLE GROUP BY dt select dt,dim2,DISTINCT_COUNT_HLL(user_id) FROM TABLE where dim1 = 3 GROUP BY dt, dim2 select dt,dim2,count(1) FROM TABLE where dim1 = 3 GROUP BY dt, dim2 ``` dim1,2,3,4 does not have too much high cardinality. User_id has the biggest cardinality.
@mayanks: Seems good to me. @jackie.jxt?
@jackie.jxt: Yeah, lgtm
@sleepythread: Thanks
@yupeng: @xd Nice talk at today! A Pinot table of PB size is amazing..
@mayanks: Is there a recording available?
@yupeng: yes, you can view it after registration
@mayanks: I am registered, will find it.
@xd: Thanks. Hope our experience can help other Pinot enthusiastics!
@xd: Supposedly this link will direct you there, if you register:
@mayanks: :thankyou:
@mayanks: Great talk @xd, just watched it. Largest Pinot table in the world is quite an accomplishment. Congratulations!

#random

@mohitdubey95: @mohitdubey95 has joined the channel
@xiong.juliette: @xiong.juliette has joined the channel

#troubleshooting

@chxing: Hi All, When i using pinot0.7.1 I found this error in log
@chxing: ```Grpc port is not set for instance: Controller_10.252.125.84_9000```
@fx19880617: are you using presto? to enable grpc port in pinot server, please add configs below to your pinot server configs then restart pinot servers: ``` pinot.server.grpc.enable=true pinot.server.grpc.port=8090```
@chxing: I don’t using presto now, so if pinot0.7.1 is enabled default?
@jackie.jxt: You may ignore it then
@jackie.jxt: While we should check why we log it for controller
@chxing: yes , seems shoud be a bug?
@chxing: I also got Admin port is not set for instance: Broker_sj1-pinot-controller-broker-01_8099 in controller log
@chxing: After set the conf, I still can get the error log in controller ```pinot.server.grpc.enable=true pinot.server.grpc.port=8090```
@chxing: Do you know how to dix it, thx
@patidar.rahul8392: Is there any option to remove dead server from pinot UI.
@patidar.rahul8392: I don't want to show the last server with status as Dead on Pinot UI.
@jackie.jxt: @npawar ^^ Do we support dropping server via the UI?
@npawar: no
@npawar: you’ll have to untag it, then remove it from the cluster
@npawar: to untag:
@aiyer: Hi Team -- I am trying to test some simple joins with presto . Seeing this issue for one of the tables even when I just do a select * on that table. ```java.lang.NullPointerException: null value in entry: Server_172.18.0.3_7000=null at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42) at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242) at com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214) at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161) at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:276) at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:241) at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545) at com.facebook.presto.$gen.Presto_0_254_SNAPSHOT_2999330____20210512_100627_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)```
@aiyer: I have two tables txn (fact) and txtypes (dimension) .. Just added a handful of rows (2 to 3 rows) to check the query plan..
@aiyer: but when i query txtypes , i am getting this NPE..
@aiyer: on pinot, the select * is working fine for this table.
@aiyer: pls help
@aiyer: these are the images i am using
@aiyer: I pulled latest pinot, still same error..
@mayanks: I think PrestoSQL/Trino may be more up to date than prestodb @fx19880617 ?
@aiyer: i used the image from the document... Is there some other image i should use?
@mayanks: Oh then ignore my comment.
@mayanks: Let me check and get back
@aiyer: sure
@aiyer: Any luck @mayanks ? Or is there anything else I can try to circumvent this problem?
@fx19880617: can you check if your pinot server config has ```pinot.server.instance.currentDataTableVersion=2```
@fx19880617: I think there was a recent pinot server upgrade, which make the internal data protocol version advanced than presto has
@aiyer: how can i check this ?
@aiyer: is it in the UI
@aiyer: ?
@fx19880617: for k8s
@aiyer: I am running in docker on my local
@fx19880617: check ```kubectl get configmaps pinot-live-server-config -n pinot -o yaml```
@fx19880617: replace `-n pinot` to your namespace
@aiyer: i am not running on kubernetes
@fx19880617: if just docker
@fx19880617: then I guess it doesn’t have
@fx19880617: have do you start the pinot docker
@aiyer: ```docker run \ --network=pinot-demo \ --name pinot-quickstart \ -p 9000:9000 \ -d apachepinot/pinot:latest QuickStart \ -type hybrid```
@aiyer: from here
@fx19880617: ic, then this one should have no such configs
@fx19880617: can you try image `apachepinot/pinot:0.7.1`
@aiyer: ok sure..
@aiyer: Yeah with this the NPE is gone and i am able to query from presto... but the Query console on pinot UI is going blank..
@aiyer: is there any way to get the latest pinot working with this ? Not sure of what features i will miss by using 0.7.0..
@aiyer: getting this blank query screen.
@fx19880617: hmm, can you try to clean the cache or try in another browser?
@aiyer: Yeah just tried that..
@aiyer: it worked in incognito..
@aiyer: I will try the join on presto now!! Thank you.
@fx19880617: :thumbsup:
@aiyer: ignore the last message.. i will investigate more and get back.
@fx19880617: Can you try this ```SET SESSION pinot.limit_larger_for_segment=200000000; SELECT ...```
@fx19880617: I thought default pinot.limit_larger_for_segment should be 2147483647
@aiyer: let me try
@aiyer: that didn't make any change.. actually the join is doing something strange.. its not giving the correct result..
@aiyer: ```presto:default> select txName,sum(a.amount) from txn a left join txtypes b on a.txtype=b.txtype group by txName; txName | _col1 ---------+------------------- Invoice | 2453.240119934082 (1 row) Query 20210512_173300_00076_imcrr, FINISHED, 1 node Splits: 100 total, 100 done (100.00%) 0:00 [0 rows, 62B] [0 rows/s, 157B/s] presto:default> select txName,sa from (select txtype,sum(amount) sa from txn group by txtype) a join txtypes b on a.txtype=b.txtype; txName | sa ---------+-------------------- Invoice | 2539.4801235198975 (1 row) Query 20210512_173312_00077_imcrr, FINISHED, 1 node Splits: 67 total, 67 done (100.00%) 0:00 [0 rows, 42B] [0 rows/s, 210B/s]```
@aiyer: the second query 's result is correct...
@aiyer: First query is ignoring 2 rows..
@fx19880617: hmm, how large is table txn
@aiyer: its a very small table.. just 4 records in txn and 2 records in txtypes..
@fx19880617: then try to do join all and see?
@fx19880617: ```select txName, a.amount, a.txtype from txn a left join txtypes b on a.txtype=b.txtype ```
@aiyer: ```presto:default> select txName, a.amount, a.txtype from txn a left join txtypes b on a.txtype=b.txtype ; txName | amount | txtype ---------+--------------------+-------- Invoice | 2342.1201171875 | 1 Invoice | 111.12000274658203 | 1 (2 rows) Query 20210512_173837_00085_imcrr, FINISHED, 1 node Splits: 68 total, 68 done (100.00%) 0:00 [0 rows, 62B] [0 rows/s, 244B/s] presto:default> select txName, a.amount, a.txtype from txn a left join txtypes b on a.txtype=b.txtype limit 100; txName | amount | txtype ---------+---------+-------- Invoice | 2342.12 | 1 Invoice | 65.12 | 1 Invoice | 21.12 | 1 Invoice | 111.12 | 1 (4 rows)```
@aiyer: without limit it shows 2 records..
@aiyer: with limit , it shows all 4
@fx19880617: hmm
@fx19880617: what’s generated pinot query for without limit case
@aiyer: ```GeneratedPinotQuery{query=SELECT amount, txtype FROM txn__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__ LIMIT 1, format=SQL, table=txn, expectedColumnIndices=[], groupByClauses=0, ```
@fx19880617: can you check presto config in your docker container log
@aiyer: which config?
@fx19880617: what’s the config for ```pinot.limit-large-for-segment```
@fx19880617: you can search log for this
@fx19880617: I think it should be set as 1
@aiyer: not able to find this in the docker logs..
@fx19880617: for quickstart we set this to 1 ``` connector.name=pinot pinot.controller-urls=pinot-quickstart:9000 pinot.controller-rest-service=pinot-quickstart:9000 pinot.limit-large-for-segment=1 pinot.allow-multiple-aggregations=true pinot.use-date-trunc=true pinot.infer-date-type-in-schema=true pinot.infer-timestamp-type-in-schema=true```
@fx19880617: so it’s intentional
@aiyer: ok got it.. so is this something I can reset ?
@fx19880617: you can try to create this `pinot_quickstart.properties` file
@fx19880617: then mount it to docker container
@fx19880617: also the session config should work
@aiyer: ok i will try the session cofig first..
@aiyer: actually i had set earlier when you pointed. ```SET SESSION pinot.limit_larger_for_segment=200000000; ```
@aiyer: but that didnt work
@fx19880617: SET SESSION pinot.limit_larger_for_segment=2147483647;
@fx19880617: try this
@fx19880617: then send the query again
@aiyer: ok
@fx19880617: you can check the explain for that
@aiyer: didnt work.. i guess the session setthing is not getting picked up..
@fx19880617: hmm
@fx19880617: which presto image are you using
@aiyer: ```243aa15aff9d```
@fx19880617: so explain doesn’t give the right limit?
@aiyer: correct that still has limit 1
@fx19880617: I tried on my side, it gives me the ```GeneratedPinotQuery{query=SELECT DestStateName FROM airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__ LIMIT 1000,```
@fx19880617: when I do ```presto:default> SET SESSION pinot.limit_larger_for_segment=1000;```
@fx19880617: hmm
@aiyer: hmm .. not sure whats wrong.. are you using the same image?
@fx19880617: I think as long as you can set the session config
@fx19880617: then it should be fine
@aiyer: ok.. another thing,.. I tried this as well... ``` presto:default> select * from txn; amount | id | txtype | tenant | ts --------------------+---------------+--------+--------+------------------------- 111.12000274658203 | 101_1_2020102 | 1 | 101 | 2021-05-12 12:45:20.744 2342.1201171875 | 101_1_2020105 | 1 | 101 | 2021-05-12 12:44:50.744 (2 rows)```
@aiyer: i should have gotten 5 records..
@aiyer: but only got 2..
@aiyer: but i tried select * from airlineStats, that is giving me multiple records...
@fx19880617: hmm
@fx19880617: which catalog are you using
@fx19880617: what’s the explain on this query
@aiyer: ```uery=SELECT AirTime FROM airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__ LIMIT 1```
@aiyer: ```PinotQuery{query=SELECT AirTime FROM airlineStats__TABLE_NAME_SUFFIX_TEMPLATE____TIME_BOUNDARY_FILTER_TEMPLATE__ LIMIT 1, format=SQL, table=airlineStats, expectedColumnIn```
@fx19880617: I think it’s getting 1 record per segment
@fx19880617: then merge
@fx19880617: but this session config doesn’t work really interesting
@fx19880617: can you try presto image: ```apachepinot/pinot-presto:0.254-SNAPSHOT-54a7ec79a3-20210512```
@aiyer: yeah something strange..
@aiyer: yeah i can try
@aiyer: same result ..
@aiyer: i set the session as well..
@fx19880617: hmm
@fx19880617: then change the config file and mount it
@aiyer: ok.. in production, this value how will it be decided?
@aiyer: i work out of india time zone.. so i will test this config file thing tomorrow morning my time and update here..
@fx19880617: yes
@fx19880617: in prod, you anyway need to have your own config file and set it accordingly
@aiyer: is there anything in the docs that talks about this? I would like to understand how to set it up..
@mohitdubey95: @mohitdubey95 has joined the channel
@xiong.juliette: @xiong.juliette has joined the channel
@avasudevan: I am trying to connect `superset` to Pinot using `` But, I am getting the below error… ```ERROR: (builtins.NoneType) None (Background on this error at: )``` Could anyone assist?
@avasudevan: Dockers running in my local
@chinmay.cerebro: you might want to double check broker host and port and make sure its externally addressable using something like `-p 9000:9000`
@chinmay.cerebro: from your screenshot - doesn't look like that's the case
@fx19880617: broker port is 8000 I think
@chinmay.cerebro: @fx19880617 should we update the URL in this example:
@chinmay.cerebro: > it says : ```
@fx19880617: ic
@fx19880617: it should be 8099 I think
@fx19880617: let me update tit
@chinmay.cerebro: and also mention it explicitly that it has to be externally addresable
@chinmay.cerebro: or everything is on the same network bridge
@fx19880617: it’s an example
@fx19880617: in the doc, it mentions broker port
@avasudevan: Thanks guys! That worked.

#minion-improvements

@jackie.jxt: Added this github issue:
@jackie.jxt: @npawar Please take a look and see if the proposed solution is valid
@jackie.jxt: It should also solve the derived column problem
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-05-12)

#general

#random

#troubleshooting

#minion-improvements

Reply via email to