I have a spark streaming app that is generating a lot of small parquet
files.
I need a way to prepare big parquet files.
I try to put something together that does the job for me. (similar to merge
from parquet-tools)
FileMetaData mergedMeta = mergedMetadata(inputFiles);
>
> ParquetFileWriter writer = new ParquetFileWriter(conf,
> mergedMeta.getSchema(), outputFile, ParquetFileWriter.Mode.CREATE);
>
>
> writer.start();
>
> for (Path input: inputFiles) {
>
> writer.appendFile(conf, input);
>
> hdfs.delete(input,false);
>
> }
>
>
> writer.end(mergedMeta.getKeyValueMetaData());
I need to make sure stats are up to date on these newly generated files.
How can i do so? I am using parquet-tools 1.9.0 and I dont see stats here.
any ideas.