Wail,

Great inputs/requirements! We should definitely think about how to address these. One thing that could help with the second item would be "functional indexes" - supporting indexing on an expression rather than just base data - some systems (e.g., PostgreSQL) support that - not rocket science - and that could make data that's convertible to spatial data via a function call indexable spatially. As for the first point - I'm not sure I "get it" - are external indexes not good enough? Oh - wait - is the issue that we should offer per-object transformations during load? (E.g., the ability to put a UDF on the load pipeline, like we do on the feed pipeline?)

Thx!

Mike


On 9/2/16 12:50 PM, Wail Alkowaileet wrote:
Hi Dev,

In the last year or so I have been more involved in AsterixDB. However, I'm
90% user and 10% developer (due to the nature of my work). I want to share
some of my (and my colleagues) experience with ADM. However, I might be too
obvious.

One of the challenges we face most of the time is Indexing non-ADM data.
Most of the data are either in JSON or CSV format which mean all ADM
richness are not usable.

For instance in load, I usually create External (or Temporary) Dataset,
query/transform and then insert it to my Internal Dataset, which takes more
time compared with load, as a result of flush/merge operations.

Another challenging case, The TwitterFeed example
<https://ci.apache.org/projects/asterixdb/feeds/tutorial.html>, the
*longitude* and *latitude* fields are not indexable and I need to ETL to
another dataset to transform (lon,lat) to a point type*.*

It would be awesome if we can bridge non-ADM to ADM types.



Reply via email to