smaheshwar-pltr commented on code in PR #1452:
URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1893325364


##########
pyiceberg/io/pyarrow.py:
##########
@@ -2622,13 +2625,18 @@ def _dataframe_to_data_files(
         property_name=TableProperties.WRITE_TARGET_FILE_SIZE_BYTES,
         default=TableProperties.WRITE_TARGET_FILE_SIZE_BYTES_DEFAULT,
     )
+    location_provider = load_location_provider(

Review Comment:
   Don't love this. I wanted to do something like 
[this](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SerializableTable.java#L260-L268)
 and cache on at least the `Transaction` (which this method is exclusively 
invoked by) but the problem I think is that properties can change on the 
`Transaction`, potentially changing the location provider to be used. I suppose 
we can update that provider on a property change (or maybe any metadata change) 
but unsure if this complexity is even worth it.



##########
pyiceberg/io/__init__.py:
##########
@@ -344,6 +370,40 @@ def _infer_file_io_from_scheme(path: str, properties: 
Properties) -> Optional[Fi
     return None
 
 
+def _import_location_provider(location_provider_impl: str, table_location: 
str, table_properties: Properties) -> Optional[LocationProvider]:
+    try:
+        path_parts = location_provider_impl.split(".")
+        if len(path_parts) < 2:
+            raise 
ValueError(f"{TableProperties.LOCATION_PROVIDER_IMPL_DEFAULT} should be full 
path (module.CustomLocationProvider), got: {location_provider_impl}")
+        module_name, class_name = ".".join(path_parts[:-1]), path_parts[-1]
+        module = importlib.import_module(module_name)
+        class_ = getattr(module, class_name)

Review Comment:
   Can we reduce the duplication between this and IO?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to