Re: [Jchat] algebraic data types

Viktor Cerovski Thu, 09 Feb 2012 07:54:35 -0800


Raul Miller-4 wrote:
> 
> [comments inline, because otherwise I become lost]
> 
> On Tue, Feb 7, 2012 at 3:43 PM, Viktor Cerovski
> <[email protected]> wrote:
>> Raul Miller-4 wrote:
>>>> Here is where ADT show some power in reasoning about data,
>>>> because we can ask what does it mean for instance, "everything
>>>> is array".  Well, as I wrote some time ago, to me in J scalars are
>>>> not arrays because the equation
>>>>  a-:a,x
>>>> has no solution for any scalar a, but has solution for an array a.
>>>> So, ADT could be helpful in sorting this type of questions with
>>>> some rigour.
>>>
>>> But why is the ',' operation the determining issue, here?  Is your
>>> reasoning based on the existence of the ',' verb?
>>
>> Indeed. But this existence is not accidental---pretty much every
>> programming language has some sort of append, concatenate, etc.
>> It's as far as J primitives go, pretty primitive operation, also
>> frequently used.
> 
> Agreed -- it's a simple operation which is useful fairly wide variety
> of contexts.
> 
>> Furthermore, I'd like to put "everything is array" to the test: are
>> scalars arrays?  Is there any way to decide or test it or is it a mere
>> conventional saying?
> 
> It's clear to me that everything is not an array.
> 
> For example: verbs, such as the operation denoted by (,) are not
> arrays (though they may represented as arrays and their arguments and
> results are arrays).
> 
When I said "everything" I meant nouns, and in particular:
are scalars arrays?



>> Then I said scalars are not arrays because I don't have a noun in J
>> that I can append to a scalar and get that same scalar, but I have
>> such nouns in the case of arrays.  Sounds somewhat arbitrary?  I'll
>> come back to this question from the perspective of ADT.
> 
> Indeed, it sounds arbitrary.  In part because "append" is arbitrary.
> I (and have) constructed programs that use arrays which do not use
> append.  Why should append be relevant to those programs?
> 
For programs that don't use append, the very existence of append
and its properties is of course totally irrelevant.


>> First, the way you circumvent the problem:
>>
>>> If so, let's say I defined:
>>>
>>> viktor=:4 :0
>>>   R=. x,y
>>>   if. (1-:#R) *. 0 e. x ,&(#@$) y do. {.R end.
>>> )
>>>
>>> ..and let's say that I use this 'viktor' operation in place of ',' --
>>> has the definition of "Array" changed?
>>
>> Array hasn't changed, but append *has* changed to accommodate
>> the exception.
> 
> No, append has not changed.  Not unless we define append as something
> like "whatever operation we happen to be considering".  The existence
> of (viktor) does not replace the existence of (,).
> 
append is supposed to add one extra property to , without breaking
any of the current properties of , .  It's a tiny bit more general than ,



>>  Equivalently,
>>
>>   p =: *.&(''&-:) $@]
>>   append =: ,`[@.(p~)`]@.p
> 
>    3 (append -: viktor) i. i. 3
> 0
> 
>> (x p y is the predicate: is x scalar and y empty array?)
>>
>>> Is your argument that , is a language primitive where 'viktor' is not
>>> a language primitive?  If so, does that mean that
>>>
>>>    i.2 2
>>> 0 1
>>> 2 3
>>>
>>> is not a matrix, because +/ .* is not a primitive?
>>
>> I think that if , were defined as you suggest or in a similar way
>> we would have had somewhat better definition of append, more
>> attuned to the 'everything is array' statement.
> 
> But why?
> 
> We can have all three definitions -- the language allows us to do that.
> 
It's about what should , as a primitive operation of the language do.


>> ADT let us see better where is the problem.
>> I begin with the following definition of the type of integer J arrays as:
>>
>> Jarr = [int]*[int]
>>
>> The first list is the rank, the second elements.  Clearly, scalars fit
>> into
>> this type---number 5 would be represented as []*[5], number 2 as []*[2]
>> 5 + 2 as []*[5] + []*[2] = []*[7], and appending is done as:
>> []*[5] , []*[2] = [2]*[5 2].   So far so good, everything is array,
>> and we actually formulated ADT that accurately describes J int arrays,
>> no?
> 
> Maybe... (note that I have to buy into the unstated idea that a square
> bracket delimited list is a one dimensional list of constant time, and
> I think another unstated idea is that the product of the first list
> must describe the number of elements in the second list, there may be
> other issues).
> 
Yes.  These issues can be handled if we had more precise notion of ADT
than we are using now, so I acknowledge them with remark that they
are not relevant for the main issues of our discussion.


>> Well, the problem is that Jarr type, together with operations + and ,
>> is isomorphic to (which roughly means "the same as" or "equivalent to"
>> or "structurally same as") the following type:
>>
>> Jarr' = int + [int]*[int]
> 
> Or we could say:
> 
> Jarr'= int + [int]+ [[int]] + [[[int]]] + ...
> 
> with some constraints on the sizes of the underlying sublists.
> 
> But I see no advantage to mixing explicit shape with implicit shape
> for different instances of the same type.
> 
You have to specify type in such a way that I know how to construct
value of that type.  More importantly, type has to be define so that
computer knows how to construct a value of the type.  So, the
last definition is faulty insofar as it does not contain the shape
information in it.

Here is one in the same spirit:

Jarr'1 = int + int*[int] + int*int*[int] +int*int*int*[int] + ...

Now the issue here is that of termination (i.e what's "..."),
so we could end Jarr'1 with some finite numbers of int*int*..*int
(the number would be the maximal allowed rank of an array in
the corresponding implementation of J int array) or define some 
alternative way to have it extend indefinitely.

That alternative way is of course present in both Jarr and Jarr' 
due to shape being a list.



>> where the list representing rank is now further constrained to
>> *non-empty* ranks, while operations + and , now work as:
>> 2 + 5 = 7
>> 2 , 5 = [2]*[2,5]
>> 2 , [1]*[5] = [3]*[2,1,5]
>> etc
> 
> Here 7 is equivalent to []*[7]
> 
It's equivalent in the sense of two structures being isomorphic,
but when we get to programming, we have to choose between
the two (or have them both, when they would be independent 
one from another).

Whichever way, when we have scalars represented via Jarr',
2 + 5 is 7, not []*[7].  That is in a nutshell the main advantage
of Jarr' over Jarr -- it can specialize verbs over numbers!

Let me give you one real-world example:
I'm working as the spare time permits on a LISP implementation of
J, I call it J. (with the dot after J).

This implementation does lots of type-foo a-la Jarr', and is sometimes
able to get through and generate and execute fast code on the spot.
Here is an example:

In J:

   (+%)/ %:i.1000000
0.652664
   ts'(+%)/ %:i.1000000'
1.67805e7 0.885019

In J. implementation of J:

(+%)/ %:i.100000.0
Evaluation took:
  0.139 seconds of real time
  0.130000 seconds of total run time (0.110000 user, 0.020000 system)
  93.53% CPU
  164 lambdas converted
  323,250,718 processor cycles
  24,330,976 bytes consed

 0.6526637623347914

So J. implementation somehow manages to compile 164 lambdas 
and still compute the result ~6x faster than J with the somewhat 
larger memory footprint (~24Mb in J. and ~16MB in J).  No special 
code has been used.

Detailed analysis would be too involved and the speed is not exclusively 
because of Jarr'-like representation, and I haven't yet studied Roger's code
(I will eventually of course), but J. so far looks very promising.

I don't want to give false impression that J. is always so fast---in fact
it is not and for some quite standard J expressions can be like 15x
slower than J, so there are still few things to figure out.
I'll post further info on the list about J. project when it reaches some
kind of alpha stage.



>> We *explicitly* discriminate between scalars and arrays that are
>> not scalars at the type level.  So which one is the correct type?
> 
> Jarr represented scalars by having the first list of the list pair be
> empty.  And, Jarr represented non-scalars by having the first list of
> the pair be non-empty.  I see nothing wrong with using Jarr.
> 
> But if we can argue that Jarr is not the correct type, then that
> suggests that an absolute value operation which takes an int argument
> and produces an int result requires that negative values cannot be
> considered to be int (because the absolute value operation cannot
> produce negative int results from an int argument).
> 
> But, in my mind, the existence of an absolute value operation would be
> no justification for saying that int now must be represented as:
> 
> positiveInt + generalInt
> 
> I can see that as an allowable expansion, but... required?   I am not
> seeing the requirement.
> 
>> Jarr or Jarr'?  Both are equivalent at the level of J, since J does not
>> insist on either Jarr or Jarr' representation of arrays.  Insisting on
>> Jarr
>> would have been pointless, because definition of J as we know it
>> would not have changed if we used Jarr' to implement it.
> 
> But, likewise, insisting on Jarr' seems pointless.
> 
>> but this also means that the question "is everyting array in J?"
>> is a mute point, almost an arbitrary qualification.  It's nothing
>> wrong with it, there are very nice things in J like identity functions,
>> but how can it be justified from the perspective of ADT?
> 
> It seems, to me, that introducing Jarr''s extra complexity would be
> the thing needs to be justified.
> 
>> Here neither yours nor mine augmentation of , would make
>> any difference, because if we program them using Jarr type,
>> then we still dynamically check whether something is scalar or
>> not, and if we program them using Jarr' type, we'd end up
>> most likely defining a polymorphic function/procedure that
>> discriminates between int and [int]*[int] types from Jarr'.
> 
> Yes, I think we can agree that Jarr''s extra complexity does not
> prevent it from emulating Jarr.
> 
That is exactly the position that I hold --- there is no some formal
requirement to pick either Jarr or Jarr' as preferred choice.
The choice still can be made based on more "mundane" things like
speed or efficiency of representations, or simplicity of code, or even 
elegance.



>>> I am trying to make sense of your reasoning, and I am struggling.
>>>
>>>>> But a bigger problem is that we don't really have a good definition of
>>>>> "type".  And by "good" I mean:
>>>>>
>>>>> a. concise,
>>>>> b. accurate, and
>>>>> c. consistent.
>>>>>
>>>> I don't know what's wrong with types as we use it today,
>>>> either in simplest cases or in some deep theoretical sense.
>>>
>>> That sounds excellent -- but what is this definition for 'type'?
>>>
>> We gather some things together and give them a common name:
>> like numbers 1,2,3,4,5 and call them int1to5, or True and False
>> and call them Bool, or all 32 bit machine integers and call them Int32,
>> all possible lists of ints, including empty list, and call them [int]
>> etc.
>>
>> There also have to be some operations available and/or definable
>> in order to make programs, and it is often that types are chosen
>> according to operations that are to be done with values of a given
>> type.
> 
> Ok... I will try and keep this definition in mind next time I see
> statements that make assertions based on concepts of "type", to see if
> the statements still make sense.  Thank you.
> 
> But I often see "type" used as almost equivalent to "domain" and
> "range" -- for example, the distinction between "static" and "dynamic"
> type systems seems to imply that in a "static type system" the domain
> of all functions must exactly match the underlying type.  I am not
> sure how I could make sense of that kind of thinking with your
> definition.
> 
In first approximation, we can think functionally: every operation is 
a function that maps its domain to its codomain.  Domain and 
codomain are types.  Dynamically typed languages give impression
that there is one or a few types that are sufficient for everything,
so instead of thinking "what's the type that's suited for my problem",
I use the provided types for representation, taking them as givens that
I'm accustomed to using.

Well, single or a few types are always sufficient. One can program 
anything in sed, using streams of strings. Period.

However, explicit typing allows us to: tune programs by fiddling with
data representations (here I wan't array, not list, single float, not
double);
make programs where compiler proves things for us, like I want to be sure
that all functions of my module generate only non-empty strings;  
have conformance of input and output of a function to predefined types so 
that program does not have to check run-time whether empty string or 
some malformed input has been fed to a function; have purely functional
input-output, etc.

You know the joke: if program compiles in Haskell, then it's correct.

Of course, one can go blind in Haskell figuring out why a particular 
f o g doesn't compile given types of f and g and compiler error message,
just as well as in J, figuring why u"1 _1 doesn't give the result one 
wants and expects given some input arrays, or get conjunctivitis doing
tacit J.



>>>>> Note also that i. i. 4 has 0 ints... and a shape of four.  So the type
>>>>> here might be "1" though if you also consider the shape information
>>>>> the type here might be int*int*int*int.
>>>>>
>>>> oh, I see where you're headed.  You're asking for type
>>>> representation for arrays including empty ones of arbitrary
>>>> shape.  Well, ADT might be simply:
>>>>  [int]*[int]
>>>> so your array would be represented as [0,1,2,3]*[]
>>>
>>> Ok, though this leaves me with issues like: what is the type of the
>>> array
>>> (0 1)?
>>>
>> Its type is [int]*[int] (this is Jarr), and concrete value that
>> represents
>> the array is [2]*[0 1].
>>
>> It could be also of type Jarr' and the same representation just as well,
>> as I discussed moment ago.
> 
> 0 and 1 are boolean values, in J.
> 
> But perhaps you are saying that Jarr deviates from J in this context.
> 
Yes, it deviates in this regard from J.


>>> Hypothetically, I could see a J array being:  T*[int]*[atom] where T
>>> matches the type of the atoms and product([int]) matches the number of
>>> atoms.
>>
>> Sure thing (just the type would be more like T*[int]*[T], and my
>> previous example with Jarr' would now become T*(T+[int]*[T]).)
> 
> And, of course:  T*[int]*[T]  <  T*(T+[int]*[T])
> 
> But if we ignore the higher ranked arrays, your variant simplifies to
> T*T which seems wrong.
> 
You're right, I made the mistake.  We can parametrize Jarr or Jarr'
over some type T simply as:

Jarr T = [int]*[T]
Jarr' T = T+[int]*[T]

to cover Jarr float, Jarr char, etc...


>>>   But the underlying values here seem to be near-infinities, and
>>> I am not certain how I could work with them.  Or, if I take their base
>>> 2 log, they seem to be a [relatively bland] restatement of the numbers
>>> used by memory management.
>>>
>> I'm not sure what you mean: when we define type, we haven't really
>> constructed a single instance of the type, just constructively described
>> (finite or potentially infinite) collection of values that belong to the
>> type.
> 
> I mean that the algebraic data type system seems to want to use
> extremely large numbers when describing otherwise practical problems
> (remembering that if ('int' represents a 32 bit integer value) then
> ('int' is 2^32) -- coincidentally a value too large to be represented
> directly, using the 'int' data type).
> 
Still don't get when/where consideration of such large numbers 
is necessary within ADT design?  There are 2^32 ints, but int itself 
is not a value, but type, and int stands for any one particular int,
not all of them at once.

Maybe you're thinking of the set of ints having 2^32 elements?
That's true of course, but we don't have to really construct this
whole set to do ADT.

Note also that the type of list of ints, [int], describes infinitely many 
values: one empty list, 2^32 lists of one int, 2^32*2^32 lists of two ints,
ad infinum, but of course we don't have to count them, let alone 
construct them all, to do ADT. 


>> When the program rolls on, 0 or more instances of defined types are
>> created, depending on program.
> 
> But those are values, and not algebraic data types.
> 
> -- 
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> 

-- 
View this message in context: 
http://old.nabble.com/algebraic-data-types-tp33260423s24193p33294214.html
Sent from the J Chat mailing list archive at Nabble.com.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] algebraic data types

Reply via email to