jorisvandenbossche commented on a change in pull request #75:
URL: https://github.com/apache/arrow-cookbook/pull/75#discussion_r712768403



##########
File path: python/source/schema.rst
##########
@@ -108,4 +108,87 @@ as far as they are compatible
     pyarrow.Table
     col1: int32
     col2: string
-    col3: double
\ No newline at end of file
+    col3: double
+
+Merging multiple schemas
+========================
+
+When you have multiple separate groups of data that you want to combine
+it might be necessary to unify their schemas to create a superset of them
+that applies to all data sources.
+
+.. testcode::
+
+    import pyarrow as pa
+
+    first_schema = pa.schema([
+        ("country", pa.string()),
+        ("population", pa.int32())
+    ])
+
+    second_schema = pa.schema([
+        ("country_code", pa.string()),
+        ("language", pa.string())
+    ])
+
+:func:`unify_schemas` can be used to combine multiple schemas into
+a single one:
+
+.. testcode::
+
+    union_schema = pa.unify_schemas([first_schema, second_schema])
+
+    print(union_schema)
+
+.. testoutput::
+
+    country: string
+    population: int32
+    country_code: string
+    language: string
+
+If the combined schemas have overlapping columns, they can still be combined
+as far as the colliding columns retain the same type (``country_code``):

Review comment:
       Yes, but as far as I understood the unify function, its primary intended 
use case is to unify slightly different schemas (eg different ordering, missing 
column, ..), and not to merge schemas in a typical "join" context where you 
join two different tables (although this is of course also possible). 
   So while the second use might be easier to explain in itself, IMO it doesn't 
really help overall to explain the typical use of `unify_schemas`. 
   (but as I said, take it or leave it :), those things will always be 
inherently subjective (so I should maybe not have reacted again :)) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to