robertnagy1 opened a new issue, #1383: URL: https://github.com/apache/sedona/issues/1383
## Expected behavior Running df = sedona.read.format("geoparquet").load("/lakehouse/default/Files/samples/parquet/buildings.parquet") should return a spark datafra.e ## Actual behavior Returns an error Py4JJavaError Traceback (most recent call last) Cell In[16], line 1 ----> 1 sedona.read.format("geoparquet").load("/lakehouse/default/Files/samples/parquet/buildings.parquet") File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:300, in DataFrameReader.load(self, path, format, schema, **options) 298 self.options(**options) 299 if isinstance(path, str): --> 300 return self._df(self._jreader.load(path)) 301 elif path is not None: 302 if type(path) != list: File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw) 167 def deco(*a: Any, **kw: Any) -> Any: 168 try: --> 169 return f(*a, **kw) 170 except Py4JJavaError as e: 171 converted = convert_exception(e.java_exception) File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value)) Py4JJavaError: An error occurred while calling o6230.load. : Operation failed: "Bad Request", 400, HEAD, http://onelake.dfs.fabric.microsoft.com/§redacted§/lakehouse/default/Files/samples/parquet/buildings.parquet?upn=false&action=getStatus&timeout=90 at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:231) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:690) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1053) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:650) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:640) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1236) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:757) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:755) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:393) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) ## Steps to reproduce the problem Add a parquet file to the default Lakehouse for a workspace. Try to read it from the path. Some thing I noticed: Fabric mounts in the Lakehouse can be reached using the python os package but not with the notebookutils.mssparkutils python package. So mssparkutils will return the dfs location with the uuid, rather than the alias for mounted path. Might be something that is interesting. ## Settings Sedona version = 1.5.1 Apache Spark version = 3.4 Apache Flink version = Not applicable API type = Python Scala version = 2.12 JRE version = Cluster default Python version = Cluster default Environment = Microsoft Fabric -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org