On 18/07/16 19:29, Ellison Anne Williams wrote: > Good points. > > Yes, there are currently no examples included in the codebase (other than > in a roundabout kind of way via the tests, which doesn't count) and the > website doesn't have a step-by-step walkthrough of how to get up and > running from a user perspective. We can certainly add these in.
Apologies if I am being a pain here. I'm just curious about how to use Pirk, and come at it with no background -- so I may not be the best target user. I won't be offended if you tell me that it requires a smarter bear than me :-) > In terms of what you can look at right now to help get going -- take a look > at the performQuery() method of the > org.apache.pirk.test.distributed.testsuite.DistTestSuite -- it walks you > through the basic steps. Yep, I found that, and it has been useful -- though I am having to step through in a debugger to figure it out. One of the main problems (for me) is that the SystemConfiguration is used not simply to set Pirk's *implementation* options as I would have expected (such as the defaults of paillier.useGMPForModPow, pir.primeCertainty, etc), but it is also used to pass values around globally that I would expect to be (only) part of the *usage* API (e.g. query.schemas), and unit test data (test.inputJSONFile), and things that I would expect to be configuration of Pirk's plug-ins, such as runtime values for Hadoop, Elastic search, Spark, and ... It's a real grab-bag of global values. > What are you thinking in terms of making the providers more pluggable? > Perhaps Responder/Querier core and Responder/Querier modules/providers? > > Right now, the 'providers' fall under the algorithm, algorithm -> provider > -- i.e., org.apache.pirk.responder.wideskies.spark > and org.apache.pirk.responder.wideskies.mapreduce under 'wideskies'. This > could be changed to provider -> algorithm. Thus, we would have a module for > spark and then all algorithm implementations for spark would fall under the > spark module (and leverage the core). Thoughts? I've not worked my way up to looking at the CLI, Hadoop, Spark integration -- I'm still digging through the lower level PIR algorithms impl. They do seem to be kept out of the lower level code which implies they are well factored. How about pulling the CLI out of the way too? so put types like QuerierDriver and QuerierDriverCLI + friends into their own package namespace? [1] Down at the "lower level", it seems unnatural that types such as Queirier have the logic for writing themselves to a file using serialization and readFromHDFSFile. I would expect types like that (Query, Querier, Response) should be capable of being exchanged in any number of formats. So if I choose to store or transmit it in BSON, or Google Protocol Buffers, or whatever, I would not expect to have another set of methods to deal with that on each of these classes; and when the version changes these types have to deal with backwards compatibility, etc. etc. So I'd be inclined to move the persistence out of these classes. > Agree with doing some judicious refactoring of the codebase... > > Thanks! Hey, it's just my 2c! I've not lived with this code, I've not even written a working example using it, so take what I say with a large pinch of salt. It is certainly not my intention, and I am in no position, to critique. It's an interesting project, and these are stream of conciousness thoughts as I wander around finding my bearings. [1] Probably not worth considering separate mvn modules, yet. Regards, Tim
