[
https://issues.apache.org/jira/browse/SPARK-37882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475967#comment-17475967
]
Hyukjin Kwon commented on SPARK-37882:
--------------------------------------
[~mattvan83] mind providing self-contained reproducer?
> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values
> ---------------------------------------------------------------------
>
> Key: SPARK-37882
> URL: https://issues.apache.org/jira/browse/SPARK-37882
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.2.0
> Environment: Ubuntu 18.04
> Reporter: Matthieu Vanhoutte
> Priority: Major
>
> Hello,
> When trying to convert a pandas dataframe
> {code:java}
> ss_corpus_dataframe{code}
> (containing one column with two-dimensional numpy array) into a
> pandas-on-spark dataframe with the following code:
> {code:java}
> df = ps.from_pandas(ss_corpus_dataframe){code}
> I got the following error:
> {code:java}
> Traceback (most recent call last):
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py",
> line 375, in run_asgi
> result = await app(self.scope, self.receive, self.send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py",
> line 75, in __call__
> return await self.app(scope, receive, send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
> line 82, in __call__
> raise exc from None
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
> line 78, in __call__
> await self.app(scope, inner_receive, inner_send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/applications.py",
> line 208, in __call__
> await super().__call__(scope, receive, send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/applications.py",
> line 112, in __call__
> await self.middleware_stack(scope, receive, send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
> line 181, in __call__
> raise exc
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
> line 159, in __call__
> await self.app(scope, receive, _send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
> line 82, in __call__
> raise exc
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
> line 71, in __call__
> await self.app(scope, receive, sender)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
> line 656, in __call__
> await route.handle(scope, receive, send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
> line 259, in handle
> await self.app(scope, receive, send)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
> line 61, in app
> response = await func(request)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
> line 226, in app
> raw_response = await run_endpoint_function(
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
> line 159, in run_endpoint_function
> return await dependant.call(**values)
> File "./app/routers/semantic_searches.py", line 60, in
> create_semantic_search
> date_time_sem_search, clean_query, output_dict, error_code = await
> apply_semantic_search_async(query=query,
> api_sent_embed_url=settings.api_sent_embed_address,
> ss_corpus_dataframe=ss_corpus_dataframe.dataframe, id_matrices=id_matrices,
> top_k=75, similarity_score_thresh=0.5)
> File "./app/backend/semantic_search/sts_tf_semantic_search.py", line 134,
> in apply_semantic_search_async
> df = ps.from_pandas(ss_corpus_dataframe)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/namespace.py",
> line 143, in from_pandas
> return DataFrame(pobj)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/frame.py",
> line 520, in __init__
> internal = InternalFrame.from_pandas(pdf)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py",
> line 1460, in from_pandas
> ) = InternalFrame.prepare_pandas_frame(pdf)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py",
> line 1533, in prepare_pandas_frame
> spark_type = infer_pd_series_spark_type(reset_index[col], dtype)
> File
> "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/typedef/typehints.py",
> line 329, in infer_pd_series_spark_type
> return from_arrow_type(pa.Array.from_pandas(pser).type)
> File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas
> File "pyarrow/array.pxi", line 302, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values{code}
> Could it be possible to add the possibility to convert multi-dimensional
> array values from pandas to pandas-on-spark?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]