velvia commented on issue #597: URL: https://github.com/apache/arrow-rs/issues/597#issuecomment-886108157
I would definitely agree that we don’t want a specific type per timezone and that the direction of arrow2 is the right one. @jorgecarleitao the concern with using Strings is twofold: 1) Parsing cost. It means any timezone manipulation first requires parsing the string, and that has to be done for every single Array since each one owns its own String and could be different. 2) Storage and serialization costs. Using Arc<> and friends would help at least number 2, but not number 1. For fast data processing, a solution like the enum that Andrew proposed, or an enum with different timezones, for internal storage would solve both problems. One can remain compatible with the spec by still taking in a string in APIs, but allow storage and processing to be optimized. > On Jul 24, 2021, at 3:56 AM, Andrew Lamb ***@***.***> wrote: > > > Good points @jorgecarleitao <https://github.com/jorgecarleitao> and @velvia <https://github.com/velvia> -- it sounds like my challenge / problem with TimestampNanosecondArray will be solved when we bring in the ideas of arrow2. If it becomes a problem I can look into removing DATA_TYPE as Jorge suggests. > > One thing I have noticed that might warrant more thought about something other than String for storing timezones is that the DataType struct is copied around a lot in arrow code. Maybe something more like Arc<str> would be appropriate if we ever want to change the type. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub <https://github.com/apache/arrow-rs/issues/597#issuecomment-886036048>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIDPWYIKFKWZKSV35S6HZLTZKL6LANCNFSM5AZ6KW5Q>. > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
