On 11.08.2016, at 19:43, Marshall Schor <[email protected]> wrote:
> 
> I'm working on this now.
> 
> I note that the new load(InputStream, CasMgrSerialzer, CAS, boolean) method is
> "public".  Is there some code (perhaps in DkPro) that needs this form?
> 
> If not, I'll remove this method and make the reading to create the
> CasMgrSerializer "lzay" - not done until needed.

Yep, I need something like that in DKPro.

When the type system information is stored outside the binary CAS in a
separate file, that TSI file would have to be re-read for every CAS file.
Being able to pass he CasMgrSerialzer to load() allows me to read it only
once.

> Not sure about zipping the type system - we have 3 choices, perhaps: 1) 
> nothing,
> 2) zip, 3) custom compression zip (like the rest of form 6).
> 
> I'm leaning toward doing this work (if ever done) later.

I've been pushing that ahead since implementing the BinaryCasReader/Writer :)
Probably doesn't hurt if it gets pushed ahead a bit further.
I had a quick look at the CasMgrSerialzer - you called it highly inefficient.
It doesn't look that inefficient. At least it uses primitive and String arrays
and not collections :)

> ================
> 
> I have one more question - there's a comment which I don't see implemented -
> which says that when a set of deserializations are being done with the same 
> type
> system, the extra work to handle the type system is only done once:
> 
>   * This method avoids the repeated loading of the typesystem and index 
> definitions
>   * from a stream when loading many CASes in a row.
> 
> How do you think that should be implemented?

Well, that's happening when I read the CasMgrSerialzer from a separate file - as
explained above:

  casMgr = read(casMgrFile)
  for (file in directory) {
    load(file, casMgr, CAS, boolean)
  }

Cheers,

-- Richard

Reply via email to