There is also JMESPath (http://jmespath.org/) which is quite similar to JsonPath, but does have a spec and lacks the leading $ character. The AWS CLI uses JMESPath for defining queries.
On Mon, Jan 7, 2019 at 1:05 PM Reuven Lax <re...@google.com> wrote: > > > On Mon, Jan 7, 2019 at 1:44 AM Robert Bradshaw <rober...@google.com> > wrote: > >> On Sun, Jan 6, 2019 at 12:46 PM Reuven Lax <re...@google.com> wrote: >> > >> > Some time ago, @Jean-Baptiste Onofré made the excellent suggestion that >> we look into using JsonPath as a selector format for schema fields. This >> provides a simple and natural way for users to select nested schema fields, >> as well as wildcards. This would allow users to more simply select nested >> fields using the Select transform, e.g.: >> > >> > p.apply(Select.fields("event.userid", "event.location.*"); >> > >> > It would also fit into NewDoFn (Java) like this: >> > >> > @ProcessElement >> > public void process(@Field("userid") String userId, >> > @Field("action.location.*") Location location) { >> > } >> > >> > After some investigation, I believe that we're better off with >> something very close to a subset of JsonPath, but not precisely JsonPath. >> >> I am very wary of creating something that's very close to, but not >> quite, a (subset of) a well established standard. Is there >> disadvantage to not being a strict actual subset? If we go this route, >> we should at least ensure that any divergence is illegal JsonPath >> rather than having different semantic meaning. >> > > As far as I can tell, JsonPath isn't much of a "standard." There doesn't > seem to be much of a spec other than implementation. > > For the most part, I am speaking of a strict subset of JsonPath. The only > incompatibility is that JsonPath expressions all start with a '$' (which > represents the root node). So in the above expression you would write > "$.action.location.*" instead. I think staying closer to BeamSql syntax > makes more sense here, and I would like to dispense with the need to begin > with a $ character. JsonPath also assumes that each object is also a > JavaScript object (which makes no sense here), and some of the JsonPath > features are based on that. > > >> > JsonPath has many features that are Javascript specific (e.g. the >> ability to embed Javascript expressions), JsonPath also includes the >> ability to do complex filtering and aggregation, which I don't think we >> want here; Beam already provides the ability to do such filtering and >> aggregation, and it's not needed here. One example of a change: JsonPath >> queries always begin with $ (representing the root node), and I think we're >> better off not requiring that so that these queries look more like BeamSql >> queries. >> > >> > I've created a small ANTLR grammar (which has the advantage that it's >> easy to extend) for these expressions and have everything working in a >> branch. However there are a few more features of JsonPath that might be >> useful here, and I wanted community feedback to see whether it's worth >> implementing them. >> > >> > The first are array/map slices and selectors. Currently if a schema >> contains an array (or map) field, you can only select all elements of the >> array or map. JsonPath however supports selecting and slicing the array. >> For example, consider the following: >> > >> > @DefaultSchema(JavaFieldSchema.class) >> > public class Event { >> > public final String userId; >> > public final List<Action> actions; >> > } >> > >> > Currently you can apply Select.fields("actions.location"), and that >> will return a schema containing a list of Locations, one for every action >> in the original event. If we allowed slicing, you could instead write >> Select.fields("actions[0:9].locations"), which would do the same but only >> for the first 10 elements of the array. >> > >> > Is this useful in Beam? It would not be hard to implement, but I want >> to see what folks think first. >> > >> > The second feature is recursive field selection. The example often >> given in JsonPath is a Json document containing the inventory for a store. >> There are lists of subobjects representing books, bicycles, tables, chairs, >> etc. etc. The JsonPath query "$..price" recursively finds every object that >> has a field named price, and returns those prices; in this case it returns >> the price of every element in the store. >> > >> > I'm a bit less convinced that recursive field selection is useful in >> Beam. The usual example for Json involves a document that represents an >> entire corpus, e.g. a store inventory. In Beam, the schemas are applied to >> individual records, and I don't know how often there will be a use for this >> sort of recursive selection. However I could be wrong here, so if anyone >> has a good use case for this sort of selection, please let me know. >> >> Records often contain lists, e.g. the record could be an order, and it >> could be useful to select on the price of the items (just to throw it >> out there). >> > > BTW, that already works. The .. operator in JsonPath is a recursive field > search, across any lists or records that are lower in the tree. >