Apache Pinot Daily Email Digest (2021-08-19)

Pinot Slack Email Digest Thu, 19 Aug 2021 19:00:38 -0700

#general

@albertopang: @albertopang has joined the channel
@sidarthar: @sidarthar has joined the channel
@stuart.edgington: @stuart.edgington has joined the channel
@chxing: Hi All, Need we add index (like inverted index) for time column or time column already have index itself? thx
@ken: I think that if the entries in the time column are sorted, Pinot will figure that out and automatically add a sorted index.
@mayanks: yes, if a column is sorted Pinot will identify that and add the sorted column. Typically, you don't need to set inv index on time column, either because it is already sorted, or it is partitioned enough (naturally) that Pinot can prune out rows without the need of explicit index.
@chxing: Thx @ken @mayanks :grinning:
@cjfudge: @cjfudge has joined the channel
@rgoyal2191: @rgoyal2191 has joined the channel
@nawazshahm: @nawazshahm has joined the channel

#random

@albertopang: @albertopang has joined the channel
@sidarthar: @sidarthar has joined the channel
@stuart.edgington: @stuart.edgington has joined the channel
@cjfudge: @cjfudge has joined the channel
@rgoyal2191: @rgoyal2191 has joined the channel
@nawazshahm: @nawazshahm has joined the channel

#troubleshooting

@albertopang: @albertopang has joined the channel
@deemish2: Hi , I am testing retention period in realtime as well as offline table , with “retentionTimeValue”: 1 and “retentionTimeUnit”: “HOURS”. In that case , data should be deleted after 1 hour from this table . But I can see the data still after 2 hours
@xiangfu0: pinot retention manager kicks off by default 6 hours I think you can config it in controller: See here :
@deemish2: Thanks Xiang
@bajpai.arpita746462: Hi Everyone, I am also trying the retention period in realtime table, with retention period of 1 hour. But i can still see the segments post 1 hour of creation. In the screenshot below , the segment got created at 11:30 am today but it is still present
@xiangfu0:
@sidarthar: @sidarthar has joined the channel
@syedakram93: is there any option to specify to reload segments in parallel?
@ken: I haven’t seen any such option. Though each server will be re-loading segments in parallel. Also the low-level code loads segments in response to messages received - but I don’t know if that message handling is done in parallel (threaded). Maybe @jackie.jxt or @g.kishore could comment here? :slightly_smiling_face:
@g.kishore: segment reload done in parallel. you can control it using some low level Helix config dynamically
@jackie.jxt: @syedakram93 Segment reload on each server is sequential, and it is kind of intentional because loading in parallel can take to much resources while server still need to serve queries. Generating indexes on multiple segments in parallel can also cause memory issue
@g.kishore: my bad, I thought Helix messages are processed in parallel. @jackie.jxt are we intentionally making it single threaded?
@jackie.jxt: @g.kishore For whole table reload, it is a single message per server. We make it single threaded intentional because of the risks described above. We can add an option into the Helix message to control the parallelism, but users need to understand the side effect of it
@ken: Hi @jackie.jxt - I see `SegmentFetcherAndLoader.addOrReplaceOfflineSegment()`, which I thought was how segments got loaded. But that seems to be called by a msg that’s processing a single segment, not all segments for the server.
@g.kishore: got it. we should definitely create an issue.. someone might be able to make it multi-threaded and by default numThreads can still be 1
@ken: @g.kishore Agreed. For example, our client’s cluster is small (6-8 servers) but they all are 32 core/128GB, so beefy enough to handle multiple downloads in parallel. And we pre-build the segment indexes in a Hadoop job, so that reduces the CPU & memory impact during segment loading.
@jackie.jxt: @ken I think we are discussing 2 different things here. So there are 2 scenarios: 1. Server restart - segments are loaded via the Helix state transition, which happens in parallel and can be configured via Helix config (by default 40 threads) 2. Manual triggered reload when index config is updated in table config - sequential because it requires adding index on the fly
@jackie.jxt: So basically we want to add an option to use multiple threads for the second scenario
@jackie.jxt: Created an issue to track this:
@ken: @jackie.jxt so when I load say 1000 segments for a new offline table (not server restart), I assume that’s another situation where helix state transition msgs are processed in parallel, right?
@jackie.jxt: Yes, new segments are also processed via the helix state transition
@syedakram93: like no. of threads (num of segments)
@stuart.edgington: @stuart.edgington has joined the channel
@cjfudge: @cjfudge has joined the channel
@cjfudge: Hello - It appears that Pinot uses an empty kafka consumer group id (low level consumer) - however I think this is being deprecated from a kafka perspective as I see this in the kafka log... will this be a problem ? > Support for using the empty group id by consumers is deprecated and will be removed in the next major release
@g.kishore: I dont think this will be a problem.. Pinot does not rely on kafka consumer group. We probably used empty group id because of the api available at that time. we should be able to use the new api without change in functionality.
@anusha.munukuntla: Hi, I am trying to rotate the logs in pinot, this the log4j file which I am using, but logs are not rotating, someone could please help me out with it
@ken: I assume you’ve checked the various Pinot process logs (controller, broker, server) and you don’t see any Log4J-related errors or warnings at startup, right?
@rgoyal2191: @rgoyal2191 has joined the channel
@nawazshahm: @nawazshahm has joined the channel
@qianbo.wang: Hi, on this , it suggests to check server logs for reasons of “table being in bad state”. Can anyone help and specify which server logs I should look into (i.e. broker, controller, etc)? And is searching for the table name sufficient to find the error? or any pattern would work? Thanks in advance
@mayanks: Hello, server here implies `pinot-server`. You can grep for the segment name which is in the bad state.
@qianbo.wang: Will try. Thanks!
@chxing: Hi. I found there many long suffix segments in HDFS (as deep storage in pinot) , Is that means those segments are failed in deep storage? thx
@chxing:

#pinot-dev

@rgoyal2191: @rgoyal2191 has joined the channel
@grace.walkuski: Hi! I’m using the pinot jdbc client and I’m getting a timeout: ```java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Request timed out to {broker url} of 60000 ms at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:173) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:152) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:123) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport.executeQuery(JsonAsyncHttpPinotClientTransport.java:102) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.Connection.execute(Connection.java:127) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.Connection.execute(Connection.java:96) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.PreparedStatement.execute(PreparedStatement.java:72) ~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.PinotPreparedStatement.executeQuery(PinotPreparedStatement.java:193) ~[pinot-jdbc-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.client.PinotPreparedStatement.execute(PinotPreparedStatement.java:160) ~[pinot-jdbc-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]``` Where is the `60000` ms being set? Can I increase it? Thanks!
@xiangfu0: I think this is set at broker/server side. pinot.broker.timeoutMs pinot.server.query.executor.timeout
@grace.walkuski: Gotcha, thanks!
@grace.walkuski: when i run the same query directly against the databse via the pinot ui, it runs and takes more than 60000ms. if its set at the broker level, wouldn’t it timeout there too?
@ken: Hi @xiangfu0 - isn’t this error coming from the `AsyncHttpClient`, which is being used by `JsonAsyncHttpPinotClientTransport`? If so, then the only way I see of changing the connection timeout is via system properties, e.g. `-Dcom.ning.http.client.AsyncHttpClientConfig.defaultConnectionTimeoutInMS=120000`, but I haven’t tried that. (also defaultRequestTimeoutInMS, I think)
@xiangfu0: ui side timeout parameter is carried with the query it self, so both broker and server side will override it for that query
@xiangfu0: hmmm
@xiangfu0: ah, you mean it’s http timeout
@xiangfu0: not the query timeout
@ken: I think so, based on the stack trace
@xiangfu0: hmmm
@xiangfu0:
@xiangfu0: from code, client side set 1000 days as timeout
@xiangfu0: in `JsonAsyncHttpPinotClientTransport.java`
@ken: I think that’s the timeout for how long the `BrokerResponseFuture` will wait for the HTTP client to return a result (essentially unbounded). But I think the HTTP client is throwing the timeout exception here.

#getting-started

@mosiac: @mosiac has joined the channel
@stuart.edgington: @stuart.edgington has joined the channel

#releases

@stuart.edgington: @stuart.edgington has joined the channel
@rgoyal2191: @rgoyal2191 has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org