Great questions Tim! Detailed answers to follow soon...
On Tue, Jul 19, 2016 at 6:22 AM, Tim Ellison <[email protected]> wrote: > I'm trying to write the simplest of examples using Pirk, so I can > understand what is happening, and I'm stumbling a bit in some of the > assumptions and side effects ... > > I have a simple data schema > https://paste.apache.org/TcxK > > describing my data file > https://paste.apache.org/8QDH > > The query schema is > https://paste.apache.org/gV02 > > And finally, here's my first attempt to create a query/querier > https://paste.apache.org/1IpL > > > Observations and questions so far: > > * I've had a stab at defining the xsd's for the schema files to help me > verify them. There is a PR in the queue for you to take a look at to > see if I got it right. > > > * It seems I must put the schemas into a file. It would be useful to > have an API to define the schema directly. > > - I can see why the data schema is likely to be fixed, and therefore > not unusual to be in a file, but for ad hoc queries I'm assuming I may > want to just send the query schema alongside the Query to the responder? > > > * My data is in JSON format, but the schema is in XML - would be useful > to be able to specify the schema in a variety of formats, e.g. > json-schema for JSON data. > > - Maybe this is one area where the schema provider can be more > flexible. > > > * My first touch of the SystemConfiguration class (line: 31) causes an > attempt to read the schemas [1] before I get a chance to set the > required properties. I am calling #initialize on the loaders again to > do the actual work. > > - Why does SystemConfiguration<clinit> load the query schemas before > it can have any properties set? > - Would be helpful to have a API to define additional schema > incrementally at runtime. At the moment, I assume I must call > LoadQuerySchemas#getSchemaMap() and manipulate the map directly [2]. > > > * The order of loading the schemas is important, must load the data > schemas before the query schemas (as there is a back reference that is > checked at load time), so it becomes > SystemConfiguration.setProperty("data.schemas", "..."); > SystemConfiguration.setProperty("query.schemas", "..."); > LoadDataSchemas.initialize(); > LoadQuerySchemas.initialize(); > > > * Now I create the QueryInfo object. No idea what a number of these > parameters are doing ;-) but they do seem to relate to the core > function, plus parts of the Paillier algorithm and Hadoop integration > too (just wondering if they should be there or kept elsewhere?). > > - If the QueyInfo is API then it needs more user level doc. > > > I've not tried running the query yet! Just the first baby steps, so > stop me if I'm heading in the wrong direction. > > [1] > > https://github.com/apache/incubator-pirk/blob/master/src/main/java/org/apache/pirk/utils/SystemConfiguration.java#L73 > [2] At some point I'd like to understand the trust model for Pirk, i.e. > where the boundary is between trusted/untrusted code. > > I appreciate your patience! > > Regards, > Tim > > > >
