Hi,
> Now, when I query I get (1 | Mickey) but I never get (1 | Tom) as its in > old parquet file. So doesn't incremental query run on old parquet files ? > Could you share the command you are using for incremental query? Specific config is required by hoodie for doing incremental queries. Please see example here <https://hudi.apache.org/docs/docker_demo.html#step-7-b-incremental-query-with-spark-sql> and more documentation here <https://hudi.apache.org/docs/querying_data.html#spark-incr-query>. Please try this and let me know if it works as expected. Thanks Satish On Fri, May 29, 2020 at 5:18 AM tanujdua <[email protected]> wrote: > Hi, > We have a requirement where we keep audit_history of every change and > sometimes query on that as well. In RDBMS we have separate tables for > audit_history. However in HUDI, history is being created at every ingestion > and I want to leverage so I do have a question on incremental query. > Does incremental query runs on latest parquet file or on all the parquet > files in the partition ? I can see it runs only on latest parquet file. > > Let me illustrate more what we need. For eg we have data with 2 columns - > (id | name) where id is the primary key. > > Batch 1 - > Inserted 2 record --> 1 | Tom ; 2 | Jerry > A new parquet file is created say 1.parquet with these 2 entries > > Batch 2 - > Inserted 2 records --> 1 | Mickey ; 3 | Donald . So here primary key with > 1 is updated from Tom to Mickey > A new parquet file is created say 2.parquet with following entries - > 1 | Mickey (Record Updated) > 2 | Jerry (Record Not changed and retained) > 3 | Donald (New Record) > > Now, when I query I get (1 | Mickey) but I never get (1 | Tom) as its in > old parquet file. So doesn't incremental query run on old parquet files ? > > I can use plain vanilla spark to achieve but is there any better way to > get the audit history of updated rows using HUDI > 1) Using spark I can read all parquet files (without hoodie) - > spark.read().load(hudiConfig.getBasePath() + hudiConfig.getTableName() + > "//*//*//*.parquet"); > > > >
