Matthieu Vanhoutte created SPARK-37882:
------------------------------------------

             Summary: pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional 
array values
                 Key: SPARK-37882
                 URL: https://issues.apache.org/jira/browse/SPARK-37882
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.2.0
         Environment: Ubuntu 18.04
            Reporter: Matthieu Vanhoutte


Hello,

When trying to convert a pandas dataframe 
{code:java}
ss_corpus_dataframe{code}
 (containing one column with two-dimensional numpy array) into a 
pandas-on-spark dataframe with the following code:
{code:java}
df = ps.from_pandas(ss_corpus_dataframe){code}
I got the following error:
{code:java}
Traceback (most recent call last):
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py",
 line 375, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py",
 line 75, in __call__
    return await self.app(scope, receive, send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
 line 82, in __call__
    raise exc from None
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py",
 line 78, in __call__
    await self.app(scope, inner_receive, inner_send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/applications.py",
 line 208, in __call__
    await super().__call__(scope, receive, send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/applications.py",
 line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
 line 181, in __call__
    raise exc
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py",
 line 159, in __call__
    await self.app(scope, receive, _send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
 line 82, in __call__
    raise exc
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py",
 line 71, in __call__
    await self.app(scope, receive, sender)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
 line 656, in __call__
    await route.handle(scope, receive, send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
 line 259, in handle
    await self.app(scope, receive, send)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py",
 line 61, in app
    response = await func(request)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
 line 226, in app
    raw_response = await run_endpoint_function(
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py",
 line 159, in run_endpoint_function
    return await dependant.call(**values)
  File "./app/routers/semantic_searches.py", line 60, in create_semantic_search
    date_time_sem_search, clean_query, output_dict, error_code = await 
apply_semantic_search_async(query=query, 
api_sent_embed_url=settings.api_sent_embed_address, 
ss_corpus_dataframe=ss_corpus_dataframe.dataframe, id_matrices=id_matrices, 
top_k=75, similarity_score_thresh=0.5)
  File "./app/backend/semantic_search/sts_tf_semantic_search.py", line 134, in 
apply_semantic_search_async
    df = ps.from_pandas(ss_corpus_dataframe)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/namespace.py",
 line 143, in from_pandas
    return DataFrame(pobj)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/frame.py",
 line 520, in __init__
    internal = InternalFrame.from_pandas(pdf)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py",
 line 1460, in from_pandas
    ) = InternalFrame.prepare_pandas_frame(pdf)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py",
 line 1533, in prepare_pandas_frame
    spark_type = infer_pd_series_spark_type(reset_index[col], dtype)
  File 
"/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/typedef/typehints.py",
 line 329, in infer_pd_series_spark_type
    return from_arrow_type(pa.Array.from_pandas(pser).type)
  File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 302, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values{code}
Could it be possible to add the possibility to convert multi-dimensional array 
values from pandas to pandas-on-spark?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to