rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108312
########## File path: site/docs/python-api-intro.md ########## @@ -0,0 +1,143 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Python API + +Much of the python api conforms to the java api. You can get more info about the java api [here](https://iceberg.apache.org/api/). + + +## Tables + +The Table interface provides access to table metadata + ++ schema returns the current table schema ++ spec returns the current table partition spec ++ properties returns a map of key-value properties ++ currentSnapshot returns the current table snapshot ++ snapshots returns all valid snapshots for the table ++ snapshot(id) returns a specific snapshot by ID ++ location returns the table’s base location + +Tables also provide refresh to update the table to the latest version. + +### Scanning +Iceberg table scans start by creating a TableScan object with newScan. + +``` python +scan = table.new_scan(); +``` + +To configure a scan, call filter and select on the TableScan to get a new TableScan with those changes. + +``` python +filtered_scan = scan.filter(Expressions.equal("id", 5)) +``` + +String expressions can also be passed to the filter method. + +``` python +filtered_scan = scan.filter("id=5") +``` + +Schema projections can be applied against a TableScan by passing a list of column names. + +``` python +filtered_scan = scan.select(["col_1", "col_2", "col_3"]) +``` + +Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided. + +``` python +filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"]) +``` + + +Calls to configuration methods create a new TableScan so that each TableScan is immutable. + +When a scan is configured, planFiles, planTasks, and schema are used to return files, tasks, and the read projection. + +``` python +scan = table.new_scan() \ + .filter("id=5") \ + .select(["id", "data"]) + +projection = scan.schema +for task in scan.plan_tasks(): + print(task) +``` + +## Types + +Iceberg data types are located in iceberg.api.types.types + +### Primitives + +Primitive type instances are available from static methods in each type class. Types without parameters use get, and types like __decimal__ use factory methods: + +```python +IntegerType.get() # int +DoubleType.get() # double +DecimalType.of(9, 2) # decimal(9, 2) +``` + +### Nested types +Structs, maps, and lists are created using factory methods in type classes. + +Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](https://iceberg.apache.org/evolution/#correctness) and nullability. + +Struct fields are created using __NestedField.optional__ or __NestedField.required__. Map value and list element nullability is set in the map and list factory methods. Review comment: For method names, we typically use fixed-width font, like this: ``` ... using `NestedField.optional` or `NestedField.required`. Map value ... ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org