> But I can't figure out how to express "select struct field 0 from field 2
> of the original table where field 2 is a struct column"
>
> Any idea how the substrait message should look like for the above?

I believe it would be:

```
"expression": {
  "selection": {
    "direct_reference": {
      "struct_field" {
        "field": 2,
        "child" {
          "struct_field" {  "field": 0 }
        }
      }
    }
    "root_reference": { }
  }
}
```

To get the above I used the following python (requires [1] which could use
a review and you need some way to convert the binary substrait to json, I
used a script I have lying around):

```
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> schema = pa.schema([pa.field("points", pa.struct([pa.field("x",
pa.float64()), pa.field("y", pa.float64())]))])
>>> expr = pc.field(("points", "x"))
>>> expr.to_substrait(schema)
<pyarrow.Buffer address=0x5602249c9970 size=92 is_cpu=True is_mutable=False>
```

[1] https://github.com/apache/arrow/pull/34834

On Tue, Aug 1, 2023 at 1:45 PM Li Jin <ice.xell...@gmail.com> wrote:

> Hi,
>
> I am recently trying to do
> (1) assign a struct type column s<v1, v2>
> (2) flatten the struct columns (by assign v1=s[v1], v2=s[v2] and drop the s
> column)
>
> via Substrait and Acero.
>
> However, I ran into the problem where I don't know the proper substrait
> message to encode this (for (2))
>
> Normally, if I select a column from the origin table, it would look like
> this (e.g, select column index 1 from the original table):
>
> selection {
>   direct_reference {
>     struct_field {
>         1
>     }
>   }
> }
>
> But I can't figure out how to express "select struct field 0 from field 2
> of the original table where field 2 is a struct column"
>
> Any idea how the substrait message should look like for the above?
>

Reply via email to