srilman opened a new issue, #4921:
URL: https://github.com/apache/iceberg/issues/4921
I'm trying to access for of the metric related information (i.e.
lower_bounds, upper_bounds, distinct_counts) after performing a scan using the
Java API. I've confirmed (by looking in the manifest files) that these pieces
of metadata are written. However, the Java API says that all of these values
are null for some reason.
Minimal Reproducer:
Spark Code to Generate the Tables:
```python
from datetime import datetime
import numpy as np
import pandas as pd
import pyspark.sql.types as spark_types
def create_table(table_name="test_table"):
spark = (
SparkSession.builder.appName("Iceberg with Spark")
.config(
"spark.jars.packages",
"org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0",
)
.config("spark.sql.catalog.hadoop_prod",
"org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.hadoop_prod.type", "hadoop")
.config("spark.sql.catalog.hadoop_prod.warehouse", ".")
.config(
"spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
)
.getOrCreate()
)
df = pd.DataFrame(
{
"index": np.arange(25, dtype=np.int32),
"dates": pd.Series([datetime.strptime(f"12/11/{2010 + x}",
"%d/%m/%Y") for x in range(25)]),
}
)
schema = spark_types.StructType(
[
spark_types.StructField("index", spark_types.IntegerType(),
False),
spark_types.StructField("dates", spark_types.DateType(), False),
]
)
df = spark.createDataFrame(df, schema=schema)
df.writeTo(f"hadoop_prod.{DATABASE_NAME}.{table_name}").tableProperty(
"format-version", "2"
).tableProperty("write.delete.mode", "merge-on-read").createOrReplace()
if __name__ == "__main__":
create_table()
```
Java Code to Read Metadata w/ Table Scan
```java
public class IcebergTester {
public static void main(String[] args) throws IOException {
HadoopTables catalog = new HadoopTables();
System.setProperty("user.dir", ...);
Table table = catalog.load(...);
Expression filter = Expressions.greaterThan("index", 10);
TableScan scan = table.newScan().filter(filter);
try (CloseableIterable<FileScanTask> fileTasks = scan.planFiles()) {
for (FileScanTask fileTask : fileTasks) {
System.out.print("Lower Bounds ");
System.out.println(fileTask.file().lowerBounds());
System.out.print("Upper Bounds ");
System.out.println(fileTask.file().upperBounds());
System.out.println();
}
}
}
}
```
Output:
```
Lower Bounds null
Upper Bounds null
Lower Bounds null
Upper Bounds null
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]