I'm working on a data lake solution for an IoT framework that does 44Khz data acquisition for a few dozen sensors (~990.000 measures/seconds ) and would like suggestions on how to get an efficient data ingestion solution using Java 11+, Apache Arrow and Apache Parquet .
For data ingestion I am currently using the AvroParquetWriter implementation at https://github.com/apache/parquet-mr and I would like to partition the dataset using two fields: timestamp and sensor name. I'm not finding examples of creating partitioned datasets in this AI. I can switch from Parquet File Write API. The solution does not need to support distributed clustered processing. Just separate the partitions into different directories on the local filesystem. By the way, I currently use DataFusion to query datasets writed by AvroParquetWriter. Regards João Antonio