Team, I have a question on keeping hive in sync. Due to a shared Hadoop Environment restricting me from using hudi 0.5.1 or higher version i ended up using 0.5.0. Currently my hadoop cluster is having hive 1.2.x , which is not supporting Hudi to keep hive in sync.
So , I am not using the hive feature. I am reading it as below. sparkSession. read. format("org.apache.hudi"). load("/projects/cdp/data/base/request_application/*/*"). createOrReplaceTempView(s"base_request_application") I am going to store 3 years worth of data partitioned by day/hour. When I load 3 years data, I would have (3*365*24) = 26280 directories. Using the above approach and reading every time, I see all the directories names are indexed. Would it impact the perfromance during joining with other table, if i dont use hive way of partition pruning? Thanks, Selva