[ https://issues.apache.org/jira/browse/ARROW-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488109#comment-17488109 ]
Joris Van den Bossche edited comment on ARROW-15547 at 2/7/22, 1:30 PM: ------------------------------------------------------------------------ Can you provide a reproducible code example of the issue you encounter? With the data that you currently provided, the function works fine for me using pyarrow 6.0 (but there are no decimals in the resulting table, as it doesn't infer this type automatically from numbers): {code} In [3]: null = None In [4]: data = [{"accounted_at": .... # data as provided above In [6]: create_dataframe(data) Out[6]: pyarrow.Table booked_by: string invoice_recipient_id: string created_at: string due_date: string lines: list<item: struct<amount: double, commission: double, commissionUnit: string, description: string, soldPrice: double, type: string>> child 0, item: struct<amount: double, commission: double, commissionUnit: string, description: string, soldPrice: double, type: string> child 0, amount: double child 1, commission: double child 2, commissionUnit: string child 3, description: string child 4, soldPrice: double child 5, type: string deleted_at: null internal_code: string type: string id: string payment_term: string franchise_id: string teamleader_id: string created_by: string parent_id: null sent_by: string accounted_at: string recipient_emails: null booked_at: string status: string description: string sent_at: string {code} was (Author: jorisvandenbossche): Can you provide a reproducible code example of the issue you encounter? With the data that you currently provided, the function works fine for me (but there are no decimals in the resulting table, as it doesn't infer this type automatically from numbers): {code} In [3]: null = None In [4]: data = [{"accounted_at": .... # data as provided above In [6]: create_dataframe(data) Out[6]: pyarrow.Table booked_by: string invoice_recipient_id: string created_at: string due_date: string lines: list<item: struct<amount: double, commission: double, commissionUnit: string, description: string, soldPrice: double, type: string>> child 0, item: struct<amount: double, commission: double, commissionUnit: string, description: string, soldPrice: double, type: string> child 0, amount: double child 1, commission: double child 2, commissionUnit: string child 3, description: string child 4, soldPrice: double child 5, type: string deleted_at: null internal_code: string type: string id: string payment_term: string franchise_id: string teamleader_id: string created_by: string parent_id: null sent_by: string accounted_at: string recipient_emails: null booked_at: string status: string description: string sent_at: string {code} > Regression: Decimal type inferemce > ---------------------------------- > > Key: ARROW-15547 > URL: https://issues.apache.org/jira/browse/ARROW-15547 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 6.0.1 > Reporter: Charley Guillaume > Priority: Major > > While trying to ingest data using pyarrow 6.0.1 using this function :{{{}{}}} > {code:java} > def create_dataframe(list_dict: dict) -> pa.table: > fields = set() > for d in list_dict: > fields = fields.union(d.keys()) > dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in > fields}) > return dataframe {code} > {{I had the following error: }} > {code:java} > pyarrow.lib.ArrowInvalid: Decimal type with precision 7 does not fit into > precision inferred from first array element: 8 {code} > After downgrading too v4.0.1 the error was gone. > The data looked like that : > {noformat} > [{"accounted_at": "2022-01-31T22:55:25.702000+00:00", "booked_at": > "2022-01-27T09:24:17.539000+00:00", "booked_by": > "7b3ce009-728d-4fbc-9120-00fa8c1c8655", "created_at": > "2022-01-27T09:08:22.306000+00:00", "created_by": > "7b3ce009-728d-4fbc-9120-00fa8c1c8655", "deleted_at": null, "description": > "description of the record", "due_date": "2022-02-10T00:00:00+00:00", > "franchise_id": "9a2858c4-5c71-43d3-b28f-2352de47ff9f", "id": > "ba3f6d3a-12f4-4d78-acc5-2e59ca384c1e", "internal_code": "A.2022 / 9", > "invoice_recipient_id": "7169cef9-9cb2-461f-a38f-a4d1ce3ca1c3", "lines": > [{"type": "property", "amount": 7800, "soldPrice": 260000, "commission": 3, > "description": "Honoraires de l'agence", "commissionUnit": "PERCENT"}], > "parent_id": null, "payment_term": "14-days", "recipient_emails": null, > "sent_at": null, "sent_by": null, "status": "booked", "teamleader_id": > "xxx-yyy-www-zzz", "type": "out"}, {"accounted_at": null, "booked_at": > "2022-01-05T09:23:03.274000+00:00", "booked_by": > "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "created_at": > "2022-01-05T09:21:32.503000+00:00", "created_by": > "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "deleted_at": null, "description": > "Description content", "due_date": "2022-02-04T00:00:00+00:00", > "franchise_id": "929d47a3-c30f-404b-aaff-c96cff1bdd10", "id": > "828cd056-6aa7-4cea-9c94-ffa2db4498df", "internal_code": "BXC22 / 3", > "invoice_recipient_id": "5f90aa24-4c32-401d-927c-db9d4a9f90bf", "lines": > [{"type": "property", "amount": 92.55, "soldPrice": 3702.02, "commission": > 2.5, "description": "description2", "commissionUnit": "PERCENT"}], > "parent_id": null, "payment_term": "30-days", "recipient_emails": null, > "sent_at": "2022-01-05T09:27:34.077000+00:00", "sent_by": > "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "status": "credited", > "teamleader_id": "xxx-yzyzy-zzz-www", "type": "out"}]{noformat} > -- This message was sent by Atlassian Jira (v8.20.1#820001)