I have been musing on this JIRA:

Path - multiple symbol matches per row
https://issues.apache.org/jira/browse/MADLIB-943

and become concerned with combinatorial explosion, even for a modest number
of symbol hits per row.

For n symbols per row and m rows in a partition, number of symbol
combinations per partition is n^m.

e.g., for n=2 and m=50 this results in ~1e15 symbol combinations which we
certainly don't want to traverse.

Does anyone have experience or an opinion on this topic?

In the current version of MADlib.path()
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
a given row can only match one symbol. If a row matches multiple symbols,
the symbol that comes first in the symbol definition list will take
precedence.

In some examples I have seen around
https://aster-community.teradata.com/community/learn-aster/blog/2015/07/01/super-sweet-npath-examples-with-source-code
it seems that multiple symbols per row are used.

Question is do we need to address this at all?

Frank

Reply via email to