[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Ada Wan via Corpora Thu, 03 Aug 2023 07:41:43 -0700

Dear Hugh

Thanks. I think it is important to have the right unit of measurement (not
just how we refer to them) --- both for processing and evaluation.
While "strings" could be a more general reference, units in finer
granularity (e.g. characters, bytes, character-/byte-n-grams) would be more
precise, units in bigger span (e.g. documents) would be more suitable for
many computational multilingual processing (e.g. in the case of parallel
corpora).
But, right, I don't disagree with using the term "strings" either, when
used appropriately.


Best
Ada


On Wed, Aug 2, 2023 at 10:39 PM Hugh Paterson III <[email protected]>
wrote:

> Dear Ada,
>
> I think I am agreeing with you in terms of finding the right labels for
> the scientific units of reference. I have always wondered why computational
> linguists have not just simply called these units "strings".
>
> Kind regards,
> Hugh
>
> On Wed, Aug 2, 2023 at 11:12 AM Ada Wan via Corpora <
> [email protected]> wrote:
>
>> Re RML or any "text technologies" leveraging "grammar" (misnomer or not):
>> it is not the right time right now to be "campy" about (as in, to be
>> arguing/protesting for) "grammar", at the moment, esp. if you do not have a
>> background in Linguistics.
>> There has been quite some abuse/misconduct with
>> concepts/units/assumptions such as "words", "sentences", and "grammar" in
>> the language space (with or without computational implementation).
>>
>> The priority of my communications here is to clarify the part on the
>> scientific front, to make sure that if one happens to have gotten oneself
>> involved in this space, how one can come to more clarity on the status quo,
>> esp. given my results. There is a lot that needs to be re-evaluated and
>> re-interpreted. Simply stating that something might have been useful in the
>> past is not going to be helpful with going forward.
>>
>> If one is working in technologies with language/text data (e.g., in a
>> user-based format/framework, and not working on "grammar" as a
>> "linguistic"/philological pursuit), it is recommended that the name(s) of
>> such technologies get updated --- if "grammar" [1] does not have to be
>> mentioned or be involved, don't.
>> [1] or, including but not limited to any of the following: "word",
>> "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax",
>> "parsing", various terms related to parts of speech (e.g. "nouns",
>> "verbs")....
>>
>> Re "BTW, regarding that "parsing" aspect, what is the term used to
>> describe the gradual process of "terminological inception"?":
>> conceptualization? Coining of terms?
>> According to me, "lexical priming" is different from "terminological
>> inception".
>>
>> Re "How could you clarified intersubjectivity?":
>> https://en.wikipedia.org/wiki/Intersubjectivity :)
>> Your question is way too broad, or requires an answer that is such, which
>> I cannot entertain at the moment.
>>
>> Thanks for sharing your perspectives. I must admit I have not had time to
>> digest all of your points. But this impression recurred in me as I was
>> reading them:
>> sometimes, I sense that when one claims some concepts are not universal
>> (e.g. the ones mentioned in [1] above), others take it as that all concepts
>> are categorically invalid. That is not what I intended to communicate (with
>> all my papers, scientific work, and my comments here). It is an expert
>> opinion/finding that I shared, upon some careful evaluation.
>>
>>
>> On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora <
>> [email protected]> wrote:
>>
>>> On 7/31/23, Ada Wan <[email protected]> wrote:
>>> > That having been expressed, here are a couple of points re RML that
>>> one should pay heed:
>>> > i. to what extent and in what context is this a technology relevant?
>>>
>>>  If you were able to device an algorithm which taking as input only NL
>>> texts (composed of: 1) a start (semantic end); b) a sequence of
>>> characters from a relatively large and representative text bank; c) an
>>> end (a semantic start)) is able to exhaustively "deduce" the grammar
>>> of such texts, in addition to being able to use it with any language,
>>> you would then:
>>>
>>>  1) have defined a "space"/"coordinate system" for those texts, to
>>> frame (pretty much) all possible "meaningful 'points'"/"phrases" in
>>> terms of such grammar, which would also;
>>>  2) be a 0-search structure describing the text bank/corpus (every
>>> text segment would also become a pointer to every single actualization
>>> of that very segment in all texts, no more "n-grams" necessary!),
>>> which could;
>>>  3) be used with minimal turking/supervision to:
>>>  3.1) cleanse up all automatic translations from youtube;
>>>  3.2) keep multilingual corpora;
>>>  3.3) use it for automatic translations (demonstrably, in an almost
>>> foolproof, perfect way, since you always have the words/phrases with
>>> their context);
>>>  3.4) "cosmic/tree reading": instead reading books/sequences of
>>> characters, you would read that text as it relates to all other texts
>>> from the same topic;
>>>  3.5) parsing: you would keep a corpus of what you know so you wont
>>> have to reread about certain topics and aspects you already know
>>> (great Lord! how I hate reading a whole book to only find a few, at
>>> times marginal, sentences worth reading! or that "youthful" thing of
>>> thinking that they just discovered/created an idea because they are
>>> just verbalizing it or made a movie about it!) BTW, regarding that
>>> "parsing" aspect, what is the term used to describe the gradual
>>> process of "terminological inception"? I have heard the term
>>> "Adamization", but, even though that word doesn't really rub me the
>>> wrong way, I could imagine it is "too sexist" to some people. I
>>> wouldn't really care calling it Eveization or "pussyfication" or
>>> whatever. I just don't want to use the term that the government uses:
>>> "lexical priming" and "terminological inception" sounds too cumbersome
>>> as a verb: "terminologically incept"? doesn't sound OK in English;
>>>  3.6) of course, an easy application of that contextual parsing would
>>> be removing all that js crap and ads before they reach your awareness;
>>>  ...
>>>  3.n) not last and definitely not least I am thinking hard about how
>>> to make sure police and politicians at least have a hard time while
>>> using what I have described to "freedom love" people (I know, I know,
>>> ... "3.n" doesn't "technically" pertain to quality of implementation
>>> issues ..., but I, for one, disagree. Giving the "all tangible things"
>>> (tm) panopticon in which we are all living these days, each of us in
>>> one's own "virtual prison cell" to call it somehow, we should also
>>> think about, be openly honest about such matters)
>>>
>>>  I am working right now on such Leibnizian "characteristica
>>> universalis" kind of thing. First cleansing approx. 1.2 million texts
>>> mostly from archive.org, *.pub and the NYS Regents exams
>>> (nysedregents.org + nysl.ptfs.com) which they have, at least
>>> partially, translated to more than 10 languages. Is that relevant
>>> enough to you? ;-) I am also being quite selfish about it because I
>>> have always dreamed of being able to "read"/mind all texts which have
>>> ever been written in the same way that teens think they have to have
>>> sex with everybody in town to make sense of things.
>>>
>>> > ii. one can certainly dissect/decompose texts ...
>>>
>>>  Computing power has become insanely cheap, but it has also enabled
>>> too much "cleverhansing" out there. The Delphic phrase: "you can make
>>> sense or money" these times translates as some sort of corollary to:
>>> "using computers and then thinking about it makes you smart"; but,
>>> does it really?
>>>
>>>  It amazes me how easily you can "dissect"/"decompose texts", talk
>>> about "tensors", "vectors", ... (I am not trying to police language
>>> usage, it just amazes me); let alone all the insufferable bsing claims
>>> by the "Artificial Intelligentsia".
>>>
>>>  I would go with one character after the other and an open attempt to
>>> use the minimal amount of principles to then see what I get. IMO, when
>>> you start getting too smart about what you do, of course, you will
>>> "see" how smart you are. The poet in me likes Borges' stanzas: "... el
>>> nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y
>>> todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype,
>>> in the letters of 'rose' is the rose and the whole of the Nile (river)
>>> in the word 'Nile'")
>>>
>>> > II. Re ""magical" in the sense that when we go about our
>>> intersubjective business": some intersubjectivity can be further clarified.
>>> I don't see much of your examples as being "magical".
>>>
>>>  I actually do! How could you clarified intersubjectivity? I am trying
>>> to do so (somewhat) Mathematically (to the extent you could). Could
>>> you share any papers, "prior art" on such matters?
>>>
>>> > ii. "other people may read, mind, as well ...;": so?
>>>
>>>  which is a good thing it is alright, fine and dandy in the hippie way,
>>> I meant.
>>>
>>> > iii. "Alice bought some veggies from Bob, ...)": this I don't
>>> understand.
>>> > iv. "We see more in money ("words", ...) than just a piece of paper"
>>>
>>>  iii. and iv. overlap to some extent so I will try to explain them
>>> both quickly (which is impossible since you can write philosophies
>>> about each line, but there I'll go). To understand what Marx (may
>>> have) meant by „gesellschaftlich notwendige Arbeit” ("socially
>>> necessary labour time", wording which has made quite a few go berserk
>>> ever since):
>>>
>>>  https://en.wikipedia.org/wiki/Socially_necessary_labour_time
>>>
>>>  https://en.wikipedia.org/wiki/Transformation_problem
>>>
>>>  you have to understand the basic mathematical concepts of:
>>>
>>>  a) combined rates, and
>>>  b) intratextual systems of linear equations
>>>
>>>  Based on my teaching experience §b is easier to understand. Sorry I
>>> couldn't find an "easier" explanation on youtube of that type of SLEs
>>> than the one I used with my students preparing for the Regents:
>>>
>>>  https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf
>>>
>>>  the intratextuality of those problems matter to corpora research
>>> because different strata of "like terms" ("verbs", "adjectives", ...)
>>> is what creates grammar. "Crazy me" thinks you could to some extent
>>> describe the "likeness of terms" underlying grammar!
>>> ~
>>>  I also have a guideline about combined rates which I successfully
>>> used with my students:
>>>
>>>  https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf
>>> ~
>>>  What the eff do combined rates and SLEs have to do with Marx'
>>> transformation problem? ;-)
>>>
>>>  Well, notice that the -equitable aspect- used to solve combined rates
>>> problems is the time (regardless of how differently fast one "works"
>>> in comparison with others). There is also another type of combine rate
>>> problems: you drive to some place with a friend who doesn't care about
>>> driving fast, but you need to rest so she drives for a while ... that
>>> problem is different from two people meeting at a place each driving
>>> "on their own cars" (at their own average speed).
>>>
>>>  Serge Heiden shared a paper about presidential debates which could be
>>> also Mathematically studied as a CR kind of problem (even if
>>> politicians as the crowd management clowns they all are don't have to
>>> make sense, anyway), but as it happens with any dialogue there are
>>> parts of the conversations in which both the cars and the time is
>>> shared and other times when only (or more of) the time. I don't know
>>> of a general Mathematical formulation to CRs kinds of problems, which
>>> could be used for corpora research. On my "to do" list I have writing
>>> papers studying Euclid's Elements and Plato's Dialogues in that way.
>>>
>>>  Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in
>>> English as "labor theory of value") somewhat metaphorically stated
>>> that the exchange value of a commodity is a function of "society's
>>> labour-time". He also rendered his ideas as equations (in more of a
>>> verbally descriptive, metaphorical way), but that phrase: "society's
>>> labour-time", was and is still found from questionable to
>>> unfalsifiably wild. I don't claim to have mind reading powers, but I
>>> think in his letter to his friend Ludwig Kugelmann, the thoroughgoing
>>> Hegelian Marx was, he clearly explained what he meant (page: 222 in
>>> file, 208 in book):
>>>
>>>
>>> https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20Engels%2C%20Selected%20Correspondence.pdf
>>>
>>>  Marx To Ludwig Kugelmann In Hanover London, July 11, 1868:
>>>  All that palaver about the necessity of proving the concept of value
>>> comes from complete ignorance both of the subject dealt with and of
>>> scientific method. Every child knows that a nation which ceased to
>>> work, I will not say for a year, but even for a few weeks, would
>>> perish. Every child knows, too, that the masses of products
>>> corresponding to the different needs require different and
>>> quantitatively determined masses of the total labour of society. That
>>> this necessity of the distribution of social labour in definite
>>> proportions cannot possibly be done away with by a particular form of
>>> social production but can only change the mode of its appearance, is
>>> self-evident. No natural laws can be done away with. What can change
>>> in historically different circumstances is only the form in which
>>> these laws assert themselves. And the form in which this proportional
>>> distribution of labour asserts itself, in a state of society where the
>>> interconnection of social labour is manifested in the private exchange
>>> of the individual products of labour, is precisely the exchange value
>>> of these products.
>>> ~
>>>  So, as I see it, in a Hegelian way, Marx was seeing the whole of
>>> society as a corpus (in which we all live through our own
>>> texts/narratives) talking about "socially necessary labour time" in
>>> the way that "time" becomes the equitable aspect shared when
>>> people/(-society as a whole-) work together as described by combined
>>> rates kinds of problems.
>>>
>>>  When "Alice buys some veggies from Bob, ..." she used money as
>>> "equitable aspect" to get Bob's veggies (in the Marxian way they were
>>> both part of a combine rates problem) and you tell me this is not
>>> magical!
>>>
>>> > v. "some transactional electronic ("air"...) excitations": I don't get
>>> this.
>>>
>>>  you may pay with cash using coins or bills or using your debit card
>>> which at the end of the day become transactional electronic
>>> excitations on some hard drives. When you speak there is more to it
>>> than vibrations/fluctuations of air. (I am referring to the medium
>>> which Saussurean signifiers use)
>>>
>>> > vi. "your 'magic' and mine are different we are still able to
>>> 'communicate'. How on earth do such things happen?": a disclaimer: I am not
>>> using any magic in my attempts to communicate with you here. I try my best
>>> to place myself in your shoes to guesstimate the points that you are trying
>>> to get across. But many (as you can see above) didn't quite reach me.
>>>
>>>  "I try my best to place myself in your shoes" ... ;-) Ha, ha, ha!
>>> that is just a functional illusion. What do you know about "my shoes"?
>>> I work as a gardener (which I love to do) so they are dirty and
>>> smelly, ... I also love to eat garlic ... As I see things standing on
>>> "my dirty and smelly shoes and voicing it from my garlicky mouth"
>>> being honest and true to matters is good enough.
>>>
>>>  lbrtchx
>>> _______________________________________________
>>> Corpora mailing list -- [email protected]
>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>> To unsubscribe send an email to [email protected]
>>>
>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Reply via email to