abstractdog commented on PR #5613: URL: https://github.com/apache/hive/pull/5613#issuecomment-2612783611
> > > > > What is the use case for that service? can't I check the query history in HUE or DAS (removed for some storage reason), etc Please take a look at #5319 which is being worked on by [rtrivedi12](https://github.com/rtrivedi12) I think it provides some extra details for an active queries cc @nrg4878 > > > > > > > > > > > > looks like #5319 is completely different, it uses a well-known SHOW PROCESSLIST for the live queries (live==recent==present in hs2 memory), whereas Query History Service is meant to be a scalable historical query service, scalable in a sense that it uses the iceberg table format > > > > HUE/DAS might work from different sources, like the protobuf history, which's data source also created by a query hook, but this service aims to redesign the way of persisting data while trying to use the same or similar field names that has already been implemented by impala > > > > the current HiveProtoLoggingHook contains much information about the storage details (e.g. rolling over files and stuff), which makes it look a bit less modern when compared to e.g. iceberg format, by which we win everything (in terms of performance for instance) that we achieve by working on integrating iceberg into our product > > > > > > > > > is that supposed to do the same thing as Impala profile: https://github.com/apache/impala/blob/fdc43466350db4437b3e917d4ff24dac58af63c3/testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_default.expected.txt#L1445? > > > > > > if you mean the corresponding impala table, that's implemented in: https://issues.apache.org/jira/browse/IMPALA-12426 I can see different upstream commits for that, this is the closest one: [apache/impala@711a9f2](https://github.com/apache/impala/commit/711a9f2bad84f92dc4af61d49ae115f0dc4239da) > > their table is sys.impala_query_log > > have you checked how did they implement it, maybe used some tricks to improve perf/make it a background activity, etc? in detail no, we discussed high-level problems, like what the schema would look like and how the table is supposed to be compacted (it's not automatically, it should be taken care of by the platform) when it comes to performance details, what I tried to achieve is: 1. transform the query data to query history record - this is sync I admit, let me add extra logging: ``` long start = Time.monotonicNow(); QueryHistoryRecord record = createRecord(driverContext); LOG.debug("Created history record (in {}ms): {}", Time.monotonicNow() - start, record); ``` 2. the rest of the work is done async 3. set maxBatchSize to 100 by default and defined memory limit too, so every 100 query records should be written in one batch...I felt this a good tradeoff between too many small files vs. too rare writes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org