Re: Some code structure thoughts

Tim Ellison Wed, 20 Jul 2016 03:31:21 -0700

On 18/07/16 19:29, Ellison Anne Williams wrote:
> Good points.
> 
> Yes, there are currently no examples included in the codebase (other than
> in a roundabout kind of way via the tests, which doesn't count) and the
> website doesn't have a step-by-step walkthrough of how to get up and
> running from a user perspective. We can certainly add these in.


Apologies if I am being a pain here.  I'm just curious about how to use
Pirk, and come at it with no background -- so I may not be the best
target user.  I won't be offended if you tell me that it requires a
smarter bear than me :-)

> In terms of what you can look at right now to help get going -- take a look
> at the performQuery() method of the
> org.apache.pirk.test.distributed.testsuite.DistTestSuite -- it walks you
> through the basic steps.

Yep, I found that, and it has been useful -- though I am having to step
through in a debugger to figure it out.

One of the main problems (for me) is that the SystemConfiguration is
used not simply to set Pirk's *implementation* options as I would have
expected (such as the defaults of paillier.useGMPForModPow,
pir.primeCertainty, etc), but it is also used to pass values around
globally that I would expect to be (only) part of the *usage* API (e.g.
query.schemas), and unit test data (test.inputJSONFile), and things that
I would expect to be configuration of Pirk's plug-ins, such as runtime
values for Hadoop, Elastic search, Spark, and ...

It's a real grab-bag of global values.

> What are you thinking in terms of making the providers more pluggable?
> Perhaps Responder/Querier core and Responder/Querier modules/providers?
> 
> Right now, the 'providers' fall under the algorithm, algorithm -> provider
> -- i.e., org.apache.pirk.responder.wideskies.spark
> and org.apache.pirk.responder.wideskies.mapreduce under 'wideskies'. This
> could be changed to provider -> algorithm. Thus, we would have a module for
> spark and then all algorithm implementations for spark would fall under the
> spark module (and leverage the core). Thoughts?

I've not worked my way up to looking at the CLI, Hadoop, Spark
integration -- I'm still digging through the lower level PIR algorithms
impl.  They do seem to be kept out of the lower level code which implies
they are well factored.

How about pulling the CLI out of the way too? so put types like
QuerierDriver and QuerierDriverCLI + friends into their own package
namespace? [1]

Down at the "lower level", it seems unnatural that types such as
Queirier have the logic for writing themselves to a file using
serialization and readFromHDFSFile.

I would expect types like that (Query, Querier, Response) should be
capable of being exchanged in any number of formats.  So if I choose to
store or transmit it in BSON, or Google Protocol Buffers, or whatever, I
would not expect to have another set of methods to deal with that on
each of these classes; and when the version changes these types have to
deal with backwards compatibility, etc. etc.  So I'd be inclined to move
the persistence out of these classes.

> Agree with doing some judicious refactoring of the codebase...
> 
> Thanks!

Hey, it's just my 2c!  I've not lived with this code, I've not even
written a working example using it, so take what I say with a large
pinch of salt.  It is certainly not my intention, and I am in no
position, to critique.

It's an interesting project, and these are stream of conciousness
thoughts as I wander around finding my bearings.

[1] Probably not worth considering separate mvn modules, yet.

Regards,
Tim

Re: Some code structure thoughts

Reply via email to