Re: Some code structure thoughts

Ellison Anne Williams Wed, 20 Jul 2016 15:13:59 -0700

Hi,

At the risk of the thread becoming too nested, please find the answers
inline below.



On Wed, Jul 20, 2016 at 6:31 AM, Tim Ellison <[email protected]> wrote:

> On 18/07/16 19:29, Ellison Anne Williams wrote:
> > Good points.
> >
> > Yes, there are currently no examples included in the codebase (other than
> > in a roundabout kind of way via the tests, which doesn't count) and the
> > website doesn't have a step-by-step walkthrough of how to get up and
> > running from a user perspective. We can certainly add these in.
>
> Apologies if I am being a pain here.  I'm just curious about how to use
> Pirk, and come at it with no background -- so I may not be the best
> target user.  I won't be offended if you tell me that it requires a
> smarter bear than me :-)
>


EAW: Don't be silly :) Yes, we can and should have user docs and examples -
I will open two JIRA issues now.



>
> > In terms of what you can look at right now to help get going -- take a
> look
> > at the performQuery() method of the
> > org.apache.pirk.test.distributed.testsuite.DistTestSuite -- it walks you
> > through the basic steps.
>
> Yep, I found that, and it has been useful -- though I am having to step
> through in a debugger to figure it out.
>
> One of the main problems (for me) is that the SystemConfiguration is
> used not simply to set Pirk's *implementation* options as I would have
> expected (such as the defaults of paillier.useGMPForModPow,
> pir.primeCertainty, etc), but it is also used to pass values around
> globally that I would expect to be (only) part of the *usage* API (e.g.
> query.schemas), and unit test data (test.inputJSONFile), and things that
> I would expect to be configuration of Pirk's plug-ins, such as runtime
> values for Hadoop, Elastic search, Spark, and ...
>
> It's a real grab-bag of global values.
>
>
EAW: Yes, you nailed it -- it's a grab bag right now (no, not the best
coding practice...). Let's discuss a better model. Any thoughts on this?

Also, we should probably have multiple properties files rather than one
gigantic pirk.properties file. With multiple Responder providers, we should
probably have a separate properties file for each holding the
provider-specific properties (i.e. specific only to Storm or Spark or
whatever).

Realize too that there are many CLI options for the Responder and Querier
drivers that are not yet in the properties file...



> > What are you thinking in terms of making the providers more pluggable?
> > Perhaps Responder/Querier core and Responder/Querier modules/providers?
> >
> > Right now, the 'providers' fall under the algorithm, algorithm ->
> provider
> > -- i.e., org.apache.pirk.responder.wideskies.spark
> > and org.apache.pirk.responder.wideskies.mapreduce under 'wideskies'. This
> > could be changed to provider -> algorithm. Thus, we would have a module
> for
> > spark and then all algorithm implementations for spark would fall under
> the
> > spark module (and leverage the core). Thoughts?
>
> I've not worked my way up to looking at the CLI, Hadoop, Spark
> integration -- I'm still digging through the lower level PIR algorithms
> impl.  They do seem to be kept out of the lower level code which implies
> they are well factored.
>
> How about pulling the CLI out of the way too? so put types like
> QuerierDriver and QuerierDriverCLI + friends into their own package
> namespace? [1]
>

EAW: I am in favor of this...


>
> Down at the "lower level", it seems unnatural that types such as
> Queirier have the logic for writing themselves to a file using
> serialization and readFromHDFSFile.
>
> I would expect types like that (Query, Querier, Response) should be
> capable of being exchanged in any number of formats.  So if I choose to
> store or transmit it in BSON, or Google Protocol Buffers, or whatever, I
> would not expect to have another set of methods to deal with that on
> each of these classes; and when the version changes these types have to
> deal with backwards compatibility, etc. etc.  So I'd be inclined to move
> the persistence out of these classes.
>
>
EAW: Yes, I agree. The initial use cases were file-based and you see that
reflected in the current code. It needs to evolve to include other
transport formats and mechanisms.



> > Agree with doing some judicious refactoring of the codebase...
> >
> > Thanks!
>
> Hey, it's just my 2c!  I've not lived with this code, I've not even
> written a working example using it, so take what I say with a large
> pinch of salt.  It is certainly not my intention, and I am in no
> position, to critique.
>
> It's an interesting project, and these are stream of conciousness
> thoughts as I wander around finding my bearings.
>
>
EAW: Happy to have stream of consciousness - keep in coming! :)



> [1] Probably not worth considering separate mvn modules, yet.
>
> Regards,
> Tim
>
>

Re: Some code structure thoughts

Reply via email to