[PATCH] D130516: [llvm] compression classes

Cole Kissane via Phabricator via cfe-commits Tue, 02 Aug 2022 12:17:53 -0700

ckissane added a comment.

In D130516#3694366 <https://reviews.llvm.org/D130516#3694366>, @dblaikie wrote:


> In D130516#3694151 <https://reviews.llvm.org/D130516#3694151>, @MaskRay wrote:
>
>> In D130516#3688236 <https://reviews.llvm.org/D130516#3688236>, @dblaikie 
>> wrote:
>>
>>> In D130516#3688123 <https://reviews.llvm.org/D130516#3688123>, @MaskRay 
>>> wrote:
>>>
>>>> I'd like to make a few arguments for the current namespace+free function 
>>>> design, as opposed to the class+member function design as explored in this 
>>>> patch (but thanks for the exploration!).
>>>> Let's discuss several use cases.
>>>>
>>>> (a) if a use case just calls compress/uncompress. The class design has 
>>>> slightly more boilerplate as it needs to get the algorithm class, a new 
>>>> instance, or a singleton instance.
>>>> For each new use, the number of lines may not differ, but the involvement 
>>>> of a a static class member or an instance make the reader wonder whether 
>>>> the object will be reused or thrown away.
>>>> There is some slight cognitive burden.
>>>> The class design has a non-trivial one-shot cost to have a function 
>>>> returning the singleton instance.
>>>
>>> Though there must've been a condition that dominates this use somewhere - 
>>> I'd suggest that condition could be where the algorithm is retrieved, and 
>>> then passed to this code to use unconditionally.
>>>
>>> If the algorithm object is const and raw pointers/references are used, I 
>>> think it makes it clear to the reader that there's no ownership here, and 
>>> it's not stateful when compressing/decompressing.
>>
>> A pointer to a singleton compression class is isomorphic to an `enum class 
>> CompressionType` variable.
>
> I don't mean to suggest that either design is fundamentally more or less 
> functional - I'm totally OK with/agree that both design directions allow the 
> implementation of all the desired final/end-user-visible functionality.
>
> I'm trying to make a point about which, I think, achieves that goal in a 
> "better" way - that's the space of design discussions, I think - what kinds 
> of (developer, maintenance, etc) costs different designs incur.
>
>> Using an enum variable doesn't lose any usage pattern we can do with a 
>> pointer to a singleton compression class.
>
> I agree that either design doesn't change what's possible - I do, though, 
> think that the "usage patterns" are meaningfully different between the two 
> designs.
>
>> An enum variable allows more patterns, as the allowed values are enumerable 
>> (we don't need to worry about -Wswitch for the uses).
>>
>> Say, we do
>>
>>   auto *algo = !compression::ZlibCompression;
>>   if (!algo)
>>     ...
>>   
>>   
>>   algo->compress(...);
>>
>> either together or apart, the result is similar to the following but with 
>> (IMO) slightly larger cognitive burden:
>>
>>   if (!compression::isAvailable(format))
>>     ...
>>   
>>   compression::compress(format);
>
> Specifically two APIs that are related (it's important/necessary to check for 
> availability before calling compress or decompress) in their contracts but 
> unrelated in their API use makes it easier to misuse the APIs and have a 
> situation where the availability check doesn't cover the usage. That's what I 
> think is important/important to discuss here.
>
>>>> (b) zlib compress/uncompress immediately following an availability check.
>>>>
>>>>   // free function
>>>>   if (!compression::zlib::isAvailable())
>>>>     errs() << "cannot compress: " << 
>>>> compression::zlib::buildConfigurationHint();
>>>>   
>>>>   // class
>>>>   auto *algo = !compression::ZlibCompression;
>>>>   if (!algo->isAvailable()) {
>>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>>   }
>>>
>>> I think maybe this code might end up looking like:
>>>
>>>   Algo *algo = getAlgo(Zlib)
>>>   if (!algo)
>>>     errs() ...
>>>
>>> It's possible that this function would return non-null even for a 
>>> non-available algorithm if we wanted to communicate other things (like the 
>>> cmake macro name to enable to add the functionality)
>>
>> I think this is similarly achieved with an enum variable.
>> With the class based approach, a pointer has a static type of the ancestor 
>> compression class and a dynamic type of any possible algorithm.
>> This is not different from that: the enum variable may have a value the enum 
>> class supports.
>
> I agree that the code is similar in either case, but with a small difference 
> that is important to me - that accessing the algorithm necessarily (to some 
> degree - you could still have code that doesn't test the 
> condition/dereferences null, the same way that code can dereference an empty 
> Optional without checking first - but at least the API I'm suggesting makes 
> clear there's a connection between availability and usage).
>
>>>> (c) zlib/zstd compress/uncompress immediately following an availability 
>>>> check.
>>>>
>>>>   // free function
>>>>   if (!compression::isAvailable(format))
>>>>     errs() << "cannot compress: " << 
>>>> compression::buildConfigurationHint(format);
>>>>   
>>>>   // class
>>>>   std::unique_ptr<Compression> algo = make_compression(format);
>>>>   if (!algo->isAvailable()) {
>>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>>   }
>>>
>>> I don't think there's a need for unique_ptr here - algorithms can be 
>>> constant singletons, referenced via raw const pointers/references without 
>>> ownership.
>>>
>>> & this example doesn't include the code that does the 
>>> compression/decompression,  which seems part of the discussion & part I 
>>> find nice in that the type of compression used matches the type used in the 
>>> check necessarily rather than being passed into two APIs independently.
>>
>> Thanks for clarification. Then this fits my "singleton compression classes 
>> are isomorphic to an `enum CompressionType` variable" argument :)
>
> I don't understand what you're saying here. Could you rephrase/expand a bit?

@dblaikie, @MaskRay I think I have worked out something that is the best of 
both worlds:
none compression is represented simply as a none type for use cases that will 
use `Optional<CompressionKind>`.
once you have a CompressionKind itself you can pass it around as a value 
(because a CompressionKind is just a struct containing one uint8_t (a fake 
"enum")).
due to my operator overloading, you can do stuff like this:

  llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
            llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
        if (!OptionalCompressionScheme) {
          return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
        }
        llvm::compression::CompressionKind CompressionScheme =
            *OptionalCompressionScheme;
        if (!CompressionScheme) {
          Error("compression class " +
                (CompressionScheme->getName() + " is not available").str());
          return nullptr;
        }
        SmallVector<uint8_t, 0> Uncompressed;
        if (llvm::Error E = CompressionScheme->decompress(
                llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
          Error("could not decompress embedded file contents: " +
                llvm::toString(std::move(E)));
          return nullptr;
        }
        return llvm::MemoryBuffer::getMemBufferCopy(
            llvm::toStringRef(Uncompressed), Name);

(excerpt from `ASTReader.cpp`)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130516/new/

https://reviews.llvm.org/D130516

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130516: [llvm] compression classes

Reply via email to