Hi,
We have created a table with partition depth of 2 as year/month. We need to
read data from HUDI in Spark Streaming layer where we get the batch data of say
10 rows which we need to use to read from HUDI. We are reading it like -
// Read from HUDI
Dataset<Row> df=
spark.read().format("hudi").schema(schema).load(<base_path>+<table_name>+"/*/*")
//Apply filter
df=df.filter(df.col("year").isin(<vals>).filter(df.col("month").isin(<vals>)).filter(df.col("id").isin(<vals>));
Is it the best way to read the data ? Will HUDI take care of just reading from
the partitions or we need to take care of ? For eg. If I need to read just 1
row we can build the full path and then read which will read the parquet file
from that partition quickly but here our requirement is to read data from
multiple partitions.