(iceberg) branch main updated: Docs: Enhance Java quickstart example (#9585)

blue Sun, 04 Feb 2024 09:30:17 -0800

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/main by this push:
     new f8a4a74584 Docs: Enhance Java quickstart example (#9585)
f8a4a74584 is described below

commit f8a4a74584f535d41c21d5cac4457c603eb448a1
Author: Manu Zhang <[email protected]>
AuthorDate: Mon Feb 5 01:30:03 2024 +0800

    Docs: Enhance Java quickstart example (#9585)
---
 docs/docs/java-api-quickstart.md | 60 ++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 39 deletions(-)

diff --git a/docs/docs/java-api-quickstart.md b/docs/docs/java-api-quickstart.md
index f9d0644b50..b584a01455 100644
--- a/docs/docs/java-api-quickstart.md
+++ b/docs/docs/java-api-quickstart.md
@@ -30,13 +30,15 @@ The Hive catalog connects to a Hive metastore to keep track 
of Iceberg tables.
 You can initialize a Hive catalog with a name and some properties.
 (see: [Catalog properties](configuration.md#catalog-properties))
 
-**Note:** Currently, `setConf` is always required for hive catalogs, but this 
will change in the future.
 
 ```java
+import java.util.HashMap
+import java.util.Map
+
 import org.apache.iceberg.hive.HiveCatalog;
 
 HiveCatalog catalog = new HiveCatalog();
-catalog.setConf(spark.sparkContext().hadoopConfiguration());  // Configure 
using Spark's Hadoop configuration
+catalog.setConf(spark.sparkContext().hadoopConfiguration());  // Optionally 
use Spark's Hadoop configuration
 
 Map <String, String> properties = new HashMap<String, String>();
 properties.put("warehouse", "...");
@@ -45,8 +47,7 @@ properties.put("uri", "...");
 catalog.initialize("hive", properties);
 ```
 
-The `Catalog` interface defines methods for working with tables, like 
`createTable`, `loadTable`, `renameTable`, and `dropTable`. `HiveCatalog` 
implements the `Catalog` interface.
-
+`HiveCatalog` implements the `Catalog` interface, which defines methods for 
working with tables, like `createTable`, `loadTable`, `renameTable`, and 
`dropTable`.
 To create a table, pass an `Identifier` and a `Schema` along with other 
initial metadata:
 
 ```java
@@ -57,10 +58,10 @@ TableIdentifier name = TableIdentifier.of("logging", 
"logs");
 Table table = catalog.createTable(name, schema, spec);
 
 // or to load an existing table, use the following line
-// Table table = catalog.loadTable(name);
+Table table = catalog.loadTable(name);
 ```
 
-The logs [schema](#create-a-schema) and [partition 
spec](#create-a-partition-spec) are created below.
+The table's [schema](#create-a-schema) and [partition 
spec](#create-a-partition-spec) are created below.
 
 
 ### Using a Hadoop catalog
@@ -88,45 +89,26 @@ TableIdentifier name = TableIdentifier.of("logging", 
"logs");
 Table table = catalog.createTable(name, schema, spec);
 
 // or to load an existing table, use the following line
-// Table table = catalog.loadTable(name);
+Table table = catalog.loadTable(name);
 ```
 
-The logs [schema](#create-a-schema) and [partition 
spec](#create-a-partition-spec) are created below.
-
+The table's [schema](#create-a-schema) and [partition 
spec](#create-a-partition-spec) are created below.
 
-### Using Hadoop tables
 
-Iceberg also supports tables that are stored in a directory in HDFS. 
Concurrent writes with a Hadoop tables are not safe when stored in the local FS 
or S3. Directory tables don't support all catalog operations, like rename, so 
they use the `Tables` interface instead of `Catalog`.
+### Tables in Spark
 
-To create a table in HDFS, use `HadoopTables`:
+Spark can work with table by name using `HiveCatalog`.
 
 ```java
-import org.apache.hadoop.conf.Configuration;
-import org.apache.iceberg.hadoop.HadoopTables;
-import org.apache.iceberg.Table;
-
-Configuration conf = new Configuration();
-HadoopTables tables = new HadoopTables(conf);
-Table table = tables.create(schema, spec, table_location);
-
-// or to load an existing table, use the following line
-// Table table = tables.load(table_location);
+// spark.sql.catalog.hive_prod = org.apache.iceberg.spark.SparkCatalog
+// spark.sql.catalog.hive_prod.type = hive
+spark.table("logging.logs");
 ```
 
-!!! danger
-    Hadoop tables shouldn't be used with file systems that do not support 
atomic rename. Iceberg relies on rename to synchronize concurrent commits for 
directory tables.
-
-
-### Tables in Spark
-
-Spark uses both `HiveCatalog` and `HadoopTables` to load tables. Hive is used 
when the identifier passed to `load` or `save` is not a path, otherwise Spark 
assumes it is a path-based table.
-
-To read and write to tables from Spark see:
-
-* [SQL queries in Spark](spark-queries.md#querying-with-sql)
-* [`INSERT INTO` in Spark](spark-writes.md#insert-into)
-* [`MERGE INTO` in Spark](spark-writes.md#merge-into)
-
+Spark can also load table created by `HadoopCatalog` by path.
+```java
+spark.read.format("iceberg").load("hdfs://host:8020/warehouse_path/logging/logs");
+```
 
 ## Schemas
 
@@ -170,7 +152,7 @@ To create an Iceberg schema from an existing table, use 
converters in `SparkSche
 ```java
 import org.apache.iceberg.spark.SparkSchemaUtil;
 
-Schema schema = SparkSchemaUtil.schemaForTable(sparkSession, table_name);
+Schema schema = SparkSchemaUtil.schemaForTable(sparkSession, tableName);
 ```
 
 ## Partitioning
@@ -240,9 +222,9 @@ table.newRowDelta()
     .commit();
 
 
-// Perform a rewrite operation replacing small_file_1 and small_file_2 on 
"test-branch" with compacted_file.
+// Perform a rewrite operation replacing SMALL_FILE_1 and SMALL_FILE_2 on 
"test-branch" with compactedFile.
 table.newRewrite()
-    .rewriteFiles(ImmutableSet.of(small_file_1,small_file_2), 
ImmutableSet.of(compacted_file))
+    .rewriteFiles(ImmutableSet.of(SMALL_FILE_1, SMALL_FILE_2), 
ImmutableSet.of(compactedFile))
     .toBranch(branch)
     .commit();

(iceberg) branch main updated: Docs: Enhance Java quickstart example (#9585)

Reply via email to