I have a spark streaming app that is generating a lot of small parquet
files.
I need a way to prepare big parquet files.

I try to put something together that does the job for me. (similar to merge
from parquet-tools)

    FileMetaData mergedMeta = mergedMetadata(inputFiles);
>
>     ParquetFileWriter writer = new ParquetFileWriter(conf,
> mergedMeta.getSchema(), outputFile, ParquetFileWriter.Mode.CREATE);
>
>
>     writer.start();
>
>     for (Path input: inputFiles) {
>
>           writer.appendFile(conf, input);
>
>           hdfs.delete(input,false);
>
>         }
>
>
>     writer.end(mergedMeta.getKeyValueMetaData());



I need to make sure stats are up to date on these newly generated files.
How can i do so? I am using parquet-tools 1.9.0 and I dont see stats here.
any ideas.

Reply via email to