I was trying to write a parquet file with delta encoding. This page <https://github.com/apache/parquet-format/blob/master/Encodings.md>, states that parquet supports three types of delta encodings:
(DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY). Since spark, pyspark or pyarrow does not allow us to specify the encoding method. I was curious how one can write a file with delta encoding enabled? However, I found on the internet that, if I have columns with TimeStamp type parquet will use delta encoding. So I used the following code in *Scala* to create a parquet file. But encoding is not a delta. val df = Seq(("2018-05-01"), ("2018-05-02"), ("2018-05-03"), ("2018-05-04"), ("2018-05-05"), ("2018-05-06"), ("2018-05-07"), ("2018-05-08"), ("2018-05-09"), ("2018-05-10") ).toDF("Id") val df2 = df.withColumn("Timestamp", (col("Id").cast("timestamp"))) val df3 = df2.withColumn("Date", (col("Id").cast("date"))) df3.coalesce(1).write.format("parquet").mode("append").save("date_time2") parquet-tools shows the following information regarding the written parquet file. file schema: spark_schema --------------------------------------------------------------------------------Id: OPTIONAL BINARY L:STRING R:0 D:1Timestamp: OPTIONAL INT96 R:0 D:1Date: OPTIONAL INT32 L:DATE R:0 D:1 row group 1: RC:31 TS:1100 OFFSET:4 --------------------------------------------------------------------------------Id: BINARY SNAPPY DO:0 FPO:4 SZ:230/487/2.12 VC:31 ENC:RLE,PLAIN,BIT_PACKED ST:[min: 2018-05-01, max: 2018-05-31, num_nulls: 0]Timestamp: INT96 SNAPPY DO:0 FPO:234 SZ:212/436/2.06 VC:31 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[num_nulls: 0, min/max not defined]Date: INT32 SNAPPY DO:0 FPO:446 SZ:181/177/0.98 VC:31 ENC:RLE,PLAIN,BIT_PACKED ST:[min: 2018-05-01, max: 2018-05-31, num_nulls: 0] As you can see, no column has used delta encoding. My question is, 1) How can I write a parquet file with delta encoding? (If you can provide an example code in scala or python that would be great.) 2) How to decide which "delta encoding": (DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY) to use?