Hi Michael, The answer to your question about metadata will likely be application-specific.
For small amounts of metadata (i.e. communicating a bounding box of included geometry), there isn't much room for optimization, so a string could be fine. For larger amounts of metadata (or other constraints, like if the metadata needs to be constantly modified independent of the data), custom encodings or a second service and/or arrow table of the metadata could be the way to go. The metadata keys/values are UTF-8 strings, so nothing should prevent you from stuffing a base64-encoded protobuf in there. As for whether the library is maintained -- yes it is, but lately I've only had time to work on bug fixes or features required to maintain parity with the spec and other libs. I will be using Arrow JS in my work again soon, and that could justify more "quality of life" improvements again, but without other maintainers jumping in to contribute or needing it for my work, those things don't get done. I'd be happy to do a call with you or your team to give a short overview and introduction to the JS lib. You can also email me directly or in the #arrow-js channel on the-asf.slack.com with any questions. Best, Paul On Fri, Feb 26, 2021 at 1:47 PM Michael Lavina <michael.lav...@factset.com> wrote: > Hey Neal, > > Thanks for the response and I am glad I am using this correctly. I have > never really used email servers so hopefully this works. > > That’s exactly what I was thinking of doing is to create a standard > metadata schema to built on top of Apache Arrow with some predefined user > types. > > I guess I was just wondering if I was trying to use a screwdriver as a > hammer. It can work because we are using the metadata and that could be > anything but maybe like you said we should be creating a separate standard > entirely for defining the schema to render tables instead of defining it > within Arrow. > > Does it defeat the value of Arrow if are sending the data using buffers > and stream and a giant string of stringified metadata when I could maybe > define the metadata in protobuf binary separately. > > In addition, I was curious with all these visualization tools has someone > already developed a standard metadata for arrow to help with rendering. > Stuff like how to denote grouping of data, relationship between columns and > hidden information. > > -Michael > > From: Neal Richardson <neal.p.richard...@gmail.com> > Date: Friday, February 26, 2021 at 1:38 PM > To: dev <dev@arrow.apache.org> > Subject: Re: [JS] Exploring usage of apache arrow at my company for > complex table rendering > The Arrow IPC specification allows for custom metadata in both the Schema > and the individual Fields: > > https://urldefense.com/v3/__https://arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$ > < > https://urldefense.com/v3/__https:/arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$ > > > > Might that work for you? Another alternative would be to track your > metadata in a separate object outside of the Arrow data. > > Neal > > On Fri, Feb 26, 2021 at 5:02 AM Michael Lavina <michael.lav...@factset.com > > > wrote: > > > Hello Everyone, > > > > > > > > Some background. My name is Michael and I work at FactSet, which if you > > use Arrow you may have heard because one of our architects did a talk on > > using Arrow and Dremio. > > > > > > > https://urldefense.com/v3/__https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$ > < > https://urldefense.com/v3/__https:/hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$ > > > > > > > > > > His team has decided to use Arrow as a tabular data interchange format. > > Other teams are doing other things. We are working on standardizing our > > tabular data interchange format at our company. > > > > > > > > We have our own open-sourced columnar based schema defined in protobuf. > > > https://urldefense.com/v3/__https://github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$ > < > https://urldefense.com/v3/__https:/github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$ > > > > > > > > > > We looked into Apache Arrow a few years ago, but decided not to use it as > > it was not mature enough at the time and we had two specific requirements > > > > 1) We needed this data not just for analytics but rendering as well and > > rendering requires a lot more complicated information such as > understanding > > the type of data and relationship between data i.e. grouping > > > > 2) We need SDKs that support typescript/javascript both browser and node > > and supports both creating and consuming arrow. > > > > > > > > Now that Apache Arrow is more mature and stabilized i.e. the schema and > > sdks are post 1.x we are looking into it again. > > > > > > > > 1. we are thinking of defining specific metadata in a similar way we > > do for STACH that let’s us define some rendering specific e.g. adding > a > > metadata to a Field Schema called isHidden to denote whether we should > > render the data column or not. > > 2. It seems like there is a well developed javascript SDK that we can > > use. I am still reading the source code and the Observable articles to > > truly understand how it works. > > 1. I read one of the issues is that the JS library might be out > > sync, so do people know how actively that repo is maintained. > > 2. If there needs to be work done I think we would be able to help > > if we had some help getting started with understanding that repo. > > > > > > > > If possible we would be interested to continue to chat about the above > > ideas, get more information about if Apache Arrow is right for the job, > and > > if there is already discussion of other people are using arrow for > > rendering in addition to analytics. > > > > > > > > To clarify what I mean for existing render technologies I know stuff like > > Falcon and Perspective exist, but those seem to be for basic table > > rendering for simple tables. I mean to create a superset of arrow by > > definfing metadata that allows for complex nested headers and nested > rows. > > Something like the image below. Then you can imagine even more data > > attached such as describing the data and relationships to other data on > the > > page. You can image in the dataset there is some `personId` that is set > to > > not be rendered. This personId can then be used to gather more > information > > in another api call if you wanted to render a tooltip with maybe some bio > > information. In short, rendered tables require a lot more information > than > > just the data. Does it make sense to build this upon Arrow. > > > > > > > > > > > > -Thanks > > > > Michael > > > > > > >