Re: conver panda image column to spark dataframe

2023-08-03 Thread Sean Owen
pp4 has one row, I'm guessing - containing an array of 10 images. You want
10 rows of 1 image each.
But, just don't do this. Pass the bytes of the image as an array,
along with width/height/channels, and reshape it on use. It's just easier.
That is how the Spark image representation works anyway

On Thu, Aug 3, 2023 at 8:43 PM second_co...@yahoo.com.INVALID
 wrote:

> Hello Adrian,
>
>   here is the snippet
>
> import tensorflow_datasets as tfds
>
> (ds_train, ds_test), ds_info = tfds.load(
> dataset_name, data_dir='',  split=["train",
> "test"], with_info=True, as_supervised=True
> )
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(IntegerType(, nullable=False),
> StructField("label", IntegerType(), nullable=False)
> ])
> pp4 =
> spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4),
> ds_info)), schema)
>
>
>
> raised error
>
> , TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), 
> True), True) can not accept object array([[[14, 14, 14],
> [14, 14, 14],
> [14, 14, 14],
> ...,
> [19, 17, 20],
> [19, 17, 20],
> [19, 17, 20]],
>
>
>
>
>
> On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> can you also please show us how you created the pandas dataframe? I mean,
> how you added the actual data into the dataframe. It would help us for
> reproducing the error.
>
> Thank you,
> Pop-Tifrea Adrian
>
> On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
> second_co...@yahoo.com> wrote:
>
> i changed to
>
> ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
>
> Thank you for responding
>
> On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> when you said your pandas Dataframe has 10 rows, does that mean it
> contains 10 images? Because if that's the case, then you'd want ro only use
> 3 layers of ArrayType when you define the schema.
>
> Best regards,
> Adrian
>
>
>
> On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
>  wrote:
>
> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
> ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>


Re: conver panda image column to spark dataframe

2023-08-03 Thread second_co...@yahoo.com.INVALID
 Hello Adrian, 
  here is the snippet 
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
    dataset_name, data_dir='',  split=["train", 
"test"], with_info=True, as_supervised=True
)
schema = StructType([
    StructField("image", ArrayType(ArrayType(ArrayType(IntegerType(, 
nullable=False),
    StructField("label", IntegerType(), nullable=False)
    ])
pp4 = spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4), 
ds_info)), schema)



raised error
, TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), 
True), True) can not accept object array([[[14, 14, 14],
[14, 14, 14],
[14, 14, 14],
...,
[19, 17, 20],
[19, 17, 20],
[19, 17, 20]],




On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea 
 wrote:  
 
 Hello, 

can you also please show us how you created the pandas dataframe? I mean, how 
you added the actual data into the dataframe. It would help us for reproducing 
the error.
Thank you,Pop-Tifrea Adrian

On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com  
wrote:

 i changed to 

ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
Thank you for responding

On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea 
 wrote:  
 
 Hello, 
when you said your pandas Dataframe has 10 rows, does that mean it contains 10 
images? Because if that's the case, then you'd want ro only use 3 layers of 
ArrayType when you define the schema.
Best regards,Adrian


On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID 
 wrote:

i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 
333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500, 
333, 3)
when using spark.createDataframe(panda_dataframe, schema), i need to specify 
the schema, 

schema = StructType([
    StructField("image", 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
    ])

i get error
raise TypeError(
, TypeError: field image: 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
True) can not accept object array([[[14, 14, 14],...
Can advise how to set schema for image with numpy.ndarray ?



  
  

Re: conver panda image column to spark dataframe

2023-08-03 Thread Adrian Pop-Tifrea
Hello,

can you also please show us how you created the pandas dataframe? I mean,
how you added the actual data into the dataframe. It would help us for
reproducing the error.

Thank you,
Pop-Tifrea Adrian

On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
second_co...@yahoo.com> wrote:

> i changed to
>
> ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
>
> Thank you for responding
>
> On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> when you said your pandas Dataframe has 10 rows, does that mean it
> contains 10 images? Because if that's the case, then you'd want ro only use
> 3 layers of ArrayType when you define the schema.
>
> Best regards,
> Adrian
>
>
>
> On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
>  wrote:
>
> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
> ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>


Re: conver panda image column to spark dataframe

2023-07-31 Thread second_co...@yahoo.com.INVALID
 i changed to 

ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
Thank you for responding

On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea 
 wrote:  
 
 Hello, 
when you said your pandas Dataframe has 10 rows, does that mean it contains 10 
images? Because if that's the case, then you'd want ro only use 3 layers of 
ArrayType when you define the schema.
Best regards,Adrian


On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID 
 wrote:

i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 
333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500, 
333, 3)
when using spark.createDataframe(panda_dataframe, schema), i need to specify 
the schema, 

schema = StructType([
    StructField("image", 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
    ])

i get error
raise TypeError(
, TypeError: field image: 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
True) can not accept object array([[[14, 14, 14],...
Can advise how to set schema for image with numpy.ndarray ?



  

Re: conver panda image column to spark dataframe

2023-07-27 Thread Adrian Pop-Tifrea
Hello,

when you said your pandas Dataframe has 10 rows, does that mean it contains
10 images? Because if that's the case, then you'd want ro only use 3 layers
of ArrayType when you define the schema.

Best regards,
Adrian



On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
 wrote:

> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
> ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>