Relies inline. On Wed, May 22, 2019 at 8:40 AM Brian Bowman [email protected] <http://mailto:[email protected]> wrote:
Questions: > 1. Is this the “standard” for creating/saving a .parquet data set? > File names are specific to the application that creates them. Iceberg, for example, adds the task attempt number to ensure that no attempts try to write to the same location. Some engines like Spark also include bucket information in the file name. 2. It appears that “84abe50-a92b-4b2b-b011-30990891fb83” is a UUID. Is > the format: > part-fileSeq#-UUID.parquet or part-fileSeq#-UUID.parquet.crc > an established convention? Is this documented somewhere? > The .crc file is created by the ChecksumFileSystem. Its name is always .(data-file-name).crc 3. Is there a C++ class to create the CRC? > There is a C++ implementation of HDFS, but I don’t know if there is a local FS that supports .crc files in C++. -- Ryan Blue Software Engineer Netflix
