Marvin Humphrey <[email protected]> wrote:
>> Shouldn't segmeta itself have a format too?
>
> Yes -- it's in there, just under the "segmeta" key rather than at the root
> level.
Woops, missed it, good.
>> Are you going to provide utility APIs that components can use to deal
>> with the format number?
>
> A good plan. DataWriter already has two relevant methods.
>
> /** Create a Hash of arbitrary metadata to be serialized and stored
> * by the Segment. The default implementation supplies a Hash with
> * a single key-value pair for "format".
> */
> public incremented Hash*
> Metadata(DataWriter *self);
>
> /** Every writer must specify a file format revision number, which should
> * increment each time the format changes. Responsibility for revision
> * checking is left to the companion DataReader.
> */
> public abstract i32_t
> Format(DataWriter *self);
What does "incremented" mean?
>> eg so a component can register the N formats it's able to deal with,
>> so a consistent error is thrown if a format is too old or too new,
>> etc.
>
> Haven't got standardized methods to perform format checking in DataReader yet.
> How do these look?
>
> /** Throw an error unless the supplied format version is at least
> * <code>min</code> and no more than <code>max</code>.
> *
> * @param format Format version.
> * @param min Minimum supported format version, which must be at least 1.
> * @param max Maximum supported format version, which must be at least 1.
> * @return the version.
> public i32_t
> Validate_Format(DataReader *self, i32_t format, i32_t min, i32_t max);
>
> /** Attempt to extract a "format" value from the supplied metadata Hash.
> * If the extraction is a success, calls Validate_Format().
> *
> * @return either the return value of Validate_Format() or 0 (an invalid
> * format value).
> * /
> i32_t
> Check_Format(DataReader *self, Hash *metadata = NULL,
> i32_t min, i32_t max);
>
> Note that Validate_Format() is public, but that Check_Format(), which would be
> used by core components, is not.
>
> Implementation code (unverified):
>
> i32_t
> DataWriter_validate_format(DataReader *self, i32_t format,
> i32_t min, i32_t max)
> {
> if (format < min) {
> THROW("Format version '%i32' is less than the minimum "
> "supported version '%i32' for %o", format, min,
> DataReader_Get_Class_Name(self));
> }
> else if (format > max) {
> THROW("Format version '%i32' is greater than the maximum
> "supported version '%i32' for %o", format, max,
> DataReader_Get_Class_Name(self));
> }
> return format;
> }
>
> i32_t
> DataWriter_check_format(DataReader *self, Hash *metadata,
> i32_t min, i32_t max)
> {
> i32_t version = 0;
> if (metadata) {
> Obj *format = Hash_Fetch_Str(metadata, "format", 6);
> if (format) {
> version = DataWriter_Check_Format(self, Obj_To_I64(format),
> min, max);
> }
> }
> return version;
> }
Looks good, though, I might add a way for a given module to register
the versions it reads & writes (presumably it only writes the most
recent one); then min/max can be derived based on what was registered.
This can be useful for introspection too, so instead of just seeing
"format 2" something could decode that to the string describing what
format 2 was (eg "added omitTermFreqAndPositions capability").
> It might make sense to throw specific exception classes in Lucy. I haven't
> worked something like that out in KS for three reasons. First, it's hard to
> catch exceptions from C without leaking memory. Second Perl's try-catch
> mechanism isn't very elegant. Third, faking up a try-catch-finally interface
> in C that would be abstract enough to handle all potential host
> exception-handling mechanisms is, uh, challenging.
This sounds very difficult!
> The only caught exceptions in the KS core happen in IndexReader's open()
> command, due to the lockless opening code and for reasons you are no doubt
> familiar with. ;) All other errors are fatal.
I think I might know. So, that answers my earlier question about the
snapshots file.
> However, we could create full-fledged exception objects for Lucy, so that
> THROW
> calls might look something like this:
>
> THROW(Err_data_component_version, /* <--- An integer error id */
> "Format version '%i32' is less than the minimum "
> "supported version '%i32' for %o", format, min,
> DataReader_Get_Class_Name(self));
>
> The exception objects generated by THROW calls do not have to subclass
> Lucy::Obj, because we will always be returning control to the host. So, they
> could be, for example, plain old Java Exception subclasses.
What would THROW try to do, and, how?
Mike