Re: Concatenating TERSEd data?

Tony Harminc Fri, 17 Oct 2008 12:10:37 -0700

2008/10/17 Tim Hare <[EMAIL PROTECTED]>:
> We need to TERSE a fairly large (for us) amount of data. This data is in
> multiple separate datasets now, but needs to be sent as one large sequential
> dataset.  We can TERSE the concatenated sequential input of course; but out
> of curiosity I'm wondering: can you TERSE the individual components,
> concatenate the results via IEBGENER, and the UNTERSE the resulting file on
> the other end?


It's trivial to try, but I very much doubt it...

> From what I remember about Lempel-Ziv, the "dictionary" is built as you go
> along but it might mean that the second and subsequent files concatenated
> would be read with incomplete information, resulting in erroneous
> decompression results?

Terse appears to be Lempel-Ziv-Wegner (or Welch, depending on whose
expired patent you prefer W to stand for), but it is a particular
implementation of a general algorithm, and there are header and
trailer records, both undocumented. By inspection, the header is a
pretty straightforward 12 byte piece that describes both some original
dataset characteristics and some encoding method info, but the trailer
is longer and less obvious. It looks to me as though the trailer is
just informational, but I don't know if it contains enough information
to be skipped over reliably.

Regardless, the dictionary after the first compress/decompress
operation would not be the same as the initial dictionary, and you
would have no way to tell the decompressor to start with a virgin
dictionary.

Without knowing much about the encoding, you could terse and
concatenate the segments, and then at the other end run a splitter
program to scan through the compressed data looking for headers, and
invoked the deterse for each segment. Unfortunately the headers are
not uniquely identifiable, i.e. there is no eyecatcher, and a
syntactically correct header could occur within the compressed data
stream. So your splitter program would have to scan forward from the
13th byte, treating the data stream as 12-bit chunks, until you reach
a zero chunk, indicating logical EOF, then figure out how to skip over
the trailer, which doesn't appear to contain its own length, and scan
for the next header. It's always possible AMATERSE already does this.

Another approach might be to put the original multiple datasets into
members of a PDS, and terse that with AMATERSE, which understands
PDS[E]s. After the deterse, you would have an identical PDS, which
could be easily turned back into a sequential dataset. Or run a DSS
dump selecting your datasets, terse the output of that, then deterse
and DSS restore at the other end.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Concatenating TERSEd data?

Reply via email to