Great questions Tim!

Detailed answers to follow soon...

On Tue, Jul 19, 2016 at 6:22 AM, Tim Ellison <[email protected]> wrote:

> I'm trying to write the simplest of examples using Pirk, so I can
> understand what is happening, and I'm stumbling a bit in some of the
> assumptions and side effects ...
>
> I have a simple data schema
>         https://paste.apache.org/TcxK
>
> describing my data file
>         https://paste.apache.org/8QDH
>
> The query schema is
>         https://paste.apache.org/gV02
>
> And finally, here's my first attempt to create a query/querier
>         https://paste.apache.org/1IpL
>
>
> Observations and questions so far:
>
>  * I've had a stab at defining the xsd's for the schema files to help me
> verify them.  There is a PR in the queue for you to take a look at to
> see if I got it right.
>
>
>  * It seems I must put the schemas into a file.  It would be useful to
> have an API to define the schema directly.
>
>     - I can see why the data schema is likely to be fixed, and therefore
> not unusual to be in a file, but for ad hoc queries I'm assuming I may
> want to just send the query schema alongside the Query to the responder?
>
>
>  * My data is in JSON format, but the schema is in XML - would be useful
> to be able to specify the schema in a variety of formats, e.g.
> json-schema for JSON data.
>
>     - Maybe this is one area where the schema provider can be more
> flexible.
>
>
>  * My first touch of the SystemConfiguration class (line: 31) causes an
> attempt to read the schemas [1] before I get a chance to set the
> required properties.  I am calling #initialize on the loaders again to
> do the actual work.
>
>   - Why does SystemConfiguration<clinit> load the query schemas before
> it can have any properties set?
>   - Would be helpful to have a API to define additional schema
> incrementally at runtime.  At the moment, I assume I must call
> LoadQuerySchemas#getSchemaMap() and manipulate the map directly [2].
>
>
>  * The order of loading the schemas is important, must load the data
> schemas before the query schemas (as there is a back reference that is
> checked at load time), so it becomes
>     SystemConfiguration.setProperty("data.schemas", "...");
>     SystemConfiguration.setProperty("query.schemas", "...");
>     LoadDataSchemas.initialize();
>     LoadQuerySchemas.initialize();
>
>
>  * Now I create the QueryInfo object.  No idea what a number of these
> parameters are doing ;-)  but they do seem to relate to the core
> function, plus parts of the Paillier algorithm and Hadoop integration
> too (just wondering if they should be there or kept elsewhere?).
>
>   - If the QueyInfo is API then it needs more user level doc.
>
>
> I've not tried running the query yet!  Just the first baby steps, so
> stop me if I'm heading in the wrong direction.
>
> [1]
>
> https://github.com/apache/incubator-pirk/blob/master/src/main/java/org/apache/pirk/utils/SystemConfiguration.java#L73
> [2] At some point I'd like to understand the trust model for Pirk, i.e.
> where the boundary is between trusted/untrusted code.
>
> I appreciate your patience!
>
> Regards,
> Tim
>
>
>
>

Reply via email to