On Thu, Sep 26, 2013 at 11:12 AM, David Barbour <[email protected]> wrote:

>
> On Thu, Sep 26, 2013 at 10:03 AM, Sam Putman <[email protected]> wrote:
>
>> The notion is to have a consistent way to map between "a" large sound
>> file and "the" large sound file. From one perspective it's just a large
>> number, and it's nice if two copies of that number are never treated as
>> different things.
>>
>
> If we're considering the sound value, I think you cannot avoid having
> multiple representations for the same meaning. There are different lossless
> encodings (like Flac vs. Wav vs. 7zip'd Wav vs. self-extracting JavaScript)
> and lossy encodings (Opus vs. MP3). There will be encodings more or less
> suitable for streaming or security concerns. If we 'chunkify' a large sound
> for streaming, there is some arbitrary aliasing regarding the size of each
> chunk.
>
> So when you discuss a sound file, you are not discussing the value or
> meaning but rather a specific, syntactic representation of that meaning.
>
>
This is an excellent point of digression, I agree. One which very much
requires a separate level of analysis to address.

(I'll indulge in a paragraph of digression: For a number that represents a
sound, we have our ears to provide a measure of the "sameness" or
"difference" of a given encoding, performance, or remix. There are tools
like Shazaam that emulate this kind of fingerprinting algorithmically. I'd
be looking into something like that to provide equality or similarity
testing between data which is encoded to represent something sensory).



>
>>
>>> For identity, I prefer to formally treat uniqueness as a semantic
>>> feature, not a syntactic one.
>>>
>>
>> I entirely agree! Hence the proposal of a function hash(foo) that
>> produces a unique value for any given foo, where foo is an integer of
>> arbitrary size (aka data). We may then compare the hashes as though they
>> are the values, while saving time.
>>
>
> How often do we compare very large integers for equality?
>
> I agree that keeping some summary information about a number, perhaps even
> a hash, would be useful for quick comparisons for very large integers
> (large enough that keeping the hash in memory is negligible). But I imagine
> this would be a rather specialized use-case.
>
>
Rather often, and still less than we should. Git does this routinely, and
Datomic revolves around it. The lower-level the capability is built in, the
more often, in general, we can benefit from it.



>
>> Hashing is not associative per se but it may be made to behave
>> associatively through various tweaks:
>>
>> http://en.wikipedia.org/wiki/Merkle_tree
>>
>
>
> Even a Merkle tree or a tiger tree hash has the same problems with
> aliasing and associativity of the underlying data.
>
>
I would say, rather, that it has a subset of those problems that take less
overall computation, and more systems engineering, to solve.

With thoughtful engineering, we can get a consistent hash over a given
tree, at least most of the time. If we have to flatten the occasional tree
and treat it as a single large number, from time to time, so be it.

Since, if I'm understanding correctly, a tacit concatenative language would
have 'words' in the Forth sense, making those words the nodes of the tree
would make defining a consistent tree hash at least approachable. Doing
this for arbitrary structured data would greatly reward a consistent
approach to defining and parsing structured/typed data, which is likely
another digression.

cheers,
-Sam.
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Reply via email to