Hi okay, yes its maybe no ideal solution here. I think I would favor a PipedOutputStream/PipedInputStream pair with a separate thread over an in-memory DOM.
Do we really need a separate threadpool? We fork off threads many places in the system already, e.g. with parallel analytics queries. I thought as long as its limited to one of a few per process it should be handled by the JVM. But I might be wrong. On Thu, Jun 18, 2015 at 8:46 PM, Bob Jolliffe <bobjolli...@gmail.com> wrote: > Hi Lars > > The problem is the dataValuSetService requires an an inputstream to > feed off. There are only 2 ways to provide an inputstream that I can > think of. Either create a pipe or buffer (eg with a string). > > Creating a pipe is doable but then you also need to create a separate > thread to read it which is another resource to manage (eg with a pool) > but that seemed like more effort than it is worth. > > What I can do short term as a defensive measure is to place a limit on > the number of datavalues which can be buffered for a single > datavalueset. That way it should not be possible to explode the > memory. I'll do that soon. > > Note that in "normal" use this should not be a problem as a single adx > group corresponds to the data for one orgunit, for one period - what > is envisaged typically is a single dataset's worth. > > The other "alternative" is not to use the datavalueSetService at all > but just duplicate the code. > > Bob > > On 18 June 2015 at 15:22, Lars Helge Øverland <larshe...@gmail.com> wrote: > > Hi Bob, > > > > as you say this creates a hard limit on memory. Now all it will take to > > bring down a DHIS 2 instance is now to submit a sufficiently large import > > file. Seems like this will provide head-aches for server admins ;) Can we > > find a stream-based solution which scales well? > > > > Lars > > > > > > On Thu, Jun 18, 2015 at 2:49 PM, Bob Jolliffe <bobjolli...@gmail.com> > wrote: > >> > >> WIP committed and slight adjustment of strategy ... > >> > >> I was not comfortable with creating a new thread just to pipe from adx > to > >> dxf. > >> > >> So instead, for each adx group corresponding to a dataValueSet with > >> orgUnit, period (and potentially atributeOptionCombo), I create a > >> dataValueSet DOM document and present that to the dxf2 stream importer > >> as a stream. Given that this data is bound by a single orgunit and > >> period I don't think the DOM document is going to break the memory > >> bank. > >> > >> Basic conversion to dxf2 is working fine. > >> > >> Next task is to "implode" the categories. > >> > >> A luta Continua. > >> > >> On 12 June 2015 at 13:40, Bob Jolliffe <bobjolli...@gmail.com> wrote: > >> > Hi > >> > > >> > As yoou have seen I have already started to commit a few bits of code > >> > in support of the ADX implementation. I hadn't been planning to do > >> > this so will proceed quite slowly, but let me outline the approach I > >> > am considering for your comment and suggestion. > >> > > >> > 1. Currently we have a datavaueset service which can import dxf2 data > >> > from an inputstream. > >> > > >> > 2. I would like to use that existing service and place the adx > >> > service as a thin veneer above it rather than create a lot of > >> > duplicated code. > >> > > >> > 3. The adx data importer would read its adx input from a stream and > >> > convert that into a dxf2 stream. The main tasks it would need to > >> > perform are: > >> > (i) convert periods into dxf2 format > >> > (ii) lookup catoptcombos and attributeoptioncombos for the dimensions > >> > in the adx message > >> > All other attributes and ImportOptions would be passed through > >> > directly to the dxf2 datavalueset service. > >> > > >> > 4. In order to present the resulting dxf2 to the service as an > >> > InputStream it would have to use PipeReader/PipeWriter combination > >> > (Something Lars will recall from earlier dxf1 code). The equivalent > >> > alternative would be to post the dxf2 datasets backout to the REST > >> > endpoint but that seems wasteful and more awkward. > >> > > >> > Does that approach sound reasonable? > >> > > >> > I have some lingering uncertainty about the best way to deal with > >> > ImportSummary. The adx data is naturally grouped by orgunit/period. > >> > So I would likely split the stream and post each as a separate dxf2 > >> > datavalueset. So probably this would imply collecting the results > >> > into an <ImportSummaries ... /> element. ADX is currently silent on > >> > the result message as it deliberately does not define the transaction > >> > (just the message) so we have some latitude here to do whatever is > >> > best. The above is my best suggestion. > >> > > >> > Cheers > >> > Bob > >> > >> -- > >> Mailing list: https://launchpad.net/~dhis2-devs-core > >> Post to : dhis2-devs-core@lists.launchpad.net > >> Unsubscribe : https://launchpad.net/~dhis2-devs-core > >> More help : https://help.launchpad.net/ListHelp > > > > > > > > > > -- > > Lars Helge Øverland > > Lead developer, DHIS 2 > > University of Oslo > > Skype: larshelgeoverland > > http://www.dhis2.org > > > -- Lars Helge Øverland Lead developer, DHIS 2 University of Oslo Skype: larshelgeoverland http://www.dhis2.org <https://www.dhis2.org>
-- Mailing list: https://launchpad.net/~dhis2-devs-core Post to : dhis2-devs-core@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs-core More help : https://help.launchpad.net/ListHelp