Dear Hugh Thanks. I think it is important to have the right unit of measurement (not just how we refer to them) --- both for processing and evaluation. While "strings" could be a more general reference, units in finer granularity (e.g. characters, bytes, character-/byte-n-grams) would be more precise, units in bigger span (e.g. documents) would be more suitable for many computational multilingual processing (e.g. in the case of parallel corpora). But, right, I don't disagree with using the term "strings" either, when used appropriately.
Best Ada On Wed, Aug 2, 2023 at 10:39 PM Hugh Paterson III <[email protected]> wrote: > Dear Ada, > > I think I am agreeing with you in terms of finding the right labels for > the scientific units of reference. I have always wondered why computational > linguists have not just simply called these units "strings". > > Kind regards, > Hugh > > On Wed, Aug 2, 2023 at 11:12 AM Ada Wan via Corpora < > [email protected]> wrote: > >> Re RML or any "text technologies" leveraging "grammar" (misnomer or not): >> it is not the right time right now to be "campy" about (as in, to be >> arguing/protesting for) "grammar", at the moment, esp. if you do not have a >> background in Linguistics. >> There has been quite some abuse/misconduct with >> concepts/units/assumptions such as "words", "sentences", and "grammar" in >> the language space (with or without computational implementation). >> >> The priority of my communications here is to clarify the part on the >> scientific front, to make sure that if one happens to have gotten oneself >> involved in this space, how one can come to more clarity on the status quo, >> esp. given my results. There is a lot that needs to be re-evaluated and >> re-interpreted. Simply stating that something might have been useful in the >> past is not going to be helpful with going forward. >> >> If one is working in technologies with language/text data (e.g., in a >> user-based format/framework, and not working on "grammar" as a >> "linguistic"/philological pursuit), it is recommended that the name(s) of >> such technologies get updated --- if "grammar" [1] does not have to be >> mentioned or be involved, don't. >> [1] or, including but not limited to any of the following: "word", >> "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax", >> "parsing", various terms related to parts of speech (e.g. "nouns", >> "verbs").... >> >> Re "BTW, regarding that "parsing" aspect, what is the term used to >> describe the gradual process of "terminological inception"?": >> conceptualization? Coining of terms? >> According to me, "lexical priming" is different from "terminological >> inception". >> >> Re "How could you clarified intersubjectivity?": >> https://en.wikipedia.org/wiki/Intersubjectivity :) >> Your question is way too broad, or requires an answer that is such, which >> I cannot entertain at the moment. >> >> Thanks for sharing your perspectives. I must admit I have not had time to >> digest all of your points. But this impression recurred in me as I was >> reading them: >> sometimes, I sense that when one claims some concepts are not universal >> (e.g. the ones mentioned in [1] above), others take it as that all concepts >> are categorically invalid. That is not what I intended to communicate (with >> all my papers, scientific work, and my comments here). It is an expert >> opinion/finding that I shared, upon some careful evaluation. >> >> >> On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora < >> [email protected]> wrote: >> >>> On 7/31/23, Ada Wan <[email protected]> wrote: >>> > That having been expressed, here are a couple of points re RML that >>> one should pay heed: >>> > i. to what extent and in what context is this a technology relevant? >>> >>> If you were able to device an algorithm which taking as input only NL >>> texts (composed of: 1) a start (semantic end); b) a sequence of >>> characters from a relatively large and representative text bank; c) an >>> end (a semantic start)) is able to exhaustively "deduce" the grammar >>> of such texts, in addition to being able to use it with any language, >>> you would then: >>> >>> 1) have defined a "space"/"coordinate system" for those texts, to >>> frame (pretty much) all possible "meaningful 'points'"/"phrases" in >>> terms of such grammar, which would also; >>> 2) be a 0-search structure describing the text bank/corpus (every >>> text segment would also become a pointer to every single actualization >>> of that very segment in all texts, no more "n-grams" necessary!), >>> which could; >>> 3) be used with minimal turking/supervision to: >>> 3.1) cleanse up all automatic translations from youtube; >>> 3.2) keep multilingual corpora; >>> 3.3) use it for automatic translations (demonstrably, in an almost >>> foolproof, perfect way, since you always have the words/phrases with >>> their context); >>> 3.4) "cosmic/tree reading": instead reading books/sequences of >>> characters, you would read that text as it relates to all other texts >>> from the same topic; >>> 3.5) parsing: you would keep a corpus of what you know so you wont >>> have to reread about certain topics and aspects you already know >>> (great Lord! how I hate reading a whole book to only find a few, at >>> times marginal, sentences worth reading! or that "youthful" thing of >>> thinking that they just discovered/created an idea because they are >>> just verbalizing it or made a movie about it!) BTW, regarding that >>> "parsing" aspect, what is the term used to describe the gradual >>> process of "terminological inception"? I have heard the term >>> "Adamization", but, even though that word doesn't really rub me the >>> wrong way, I could imagine it is "too sexist" to some people. I >>> wouldn't really care calling it Eveization or "pussyfication" or >>> whatever. I just don't want to use the term that the government uses: >>> "lexical priming" and "terminological inception" sounds too cumbersome >>> as a verb: "terminologically incept"? doesn't sound OK in English; >>> 3.6) of course, an easy application of that contextual parsing would >>> be removing all that js crap and ads before they reach your awareness; >>> ... >>> 3.n) not last and definitely not least I am thinking hard about how >>> to make sure police and politicians at least have a hard time while >>> using what I have described to "freedom love" people (I know, I know, >>> ... "3.n" doesn't "technically" pertain to quality of implementation >>> issues ..., but I, for one, disagree. Giving the "all tangible things" >>> (tm) panopticon in which we are all living these days, each of us in >>> one's own "virtual prison cell" to call it somehow, we should also >>> think about, be openly honest about such matters) >>> >>> I am working right now on such Leibnizian "characteristica >>> universalis" kind of thing. First cleansing approx. 1.2 million texts >>> mostly from archive.org, *.pub and the NYS Regents exams >>> (nysedregents.org + nysl.ptfs.com) which they have, at least >>> partially, translated to more than 10 languages. Is that relevant >>> enough to you? ;-) I am also being quite selfish about it because I >>> have always dreamed of being able to "read"/mind all texts which have >>> ever been written in the same way that teens think they have to have >>> sex with everybody in town to make sense of things. >>> >>> > ii. one can certainly dissect/decompose texts ... >>> >>> Computing power has become insanely cheap, but it has also enabled >>> too much "cleverhansing" out there. The Delphic phrase: "you can make >>> sense or money" these times translates as some sort of corollary to: >>> "using computers and then thinking about it makes you smart"; but, >>> does it really? >>> >>> It amazes me how easily you can "dissect"/"decompose texts", talk >>> about "tensors", "vectors", ... (I am not trying to police language >>> usage, it just amazes me); let alone all the insufferable bsing claims >>> by the "Artificial Intelligentsia". >>> >>> I would go with one character after the other and an open attempt to >>> use the minimal amount of principles to then see what I get. IMO, when >>> you start getting too smart about what you do, of course, you will >>> "see" how smart you are. The poet in me likes Borges' stanzas: "... el >>> nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y >>> todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype, >>> in the letters of 'rose' is the rose and the whole of the Nile (river) >>> in the word 'Nile'") >>> >>> > II. Re ""magical" in the sense that when we go about our >>> intersubjective business": some intersubjectivity can be further clarified. >>> I don't see much of your examples as being "magical". >>> >>> I actually do! How could you clarified intersubjectivity? I am trying >>> to do so (somewhat) Mathematically (to the extent you could). Could >>> you share any papers, "prior art" on such matters? >>> >>> > ii. "other people may read, mind, as well ...;": so? >>> >>> which is a good thing it is alright, fine and dandy in the hippie way, >>> I meant. >>> >>> > iii. "Alice bought some veggies from Bob, ...)": this I don't >>> understand. >>> > iv. "We see more in money ("words", ...) than just a piece of paper" >>> >>> iii. and iv. overlap to some extent so I will try to explain them >>> both quickly (which is impossible since you can write philosophies >>> about each line, but there I'll go). To understand what Marx (may >>> have) meant by „gesellschaftlich notwendige Arbeit” ("socially >>> necessary labour time", wording which has made quite a few go berserk >>> ever since): >>> >>> https://en.wikipedia.org/wiki/Socially_necessary_labour_time >>> >>> https://en.wikipedia.org/wiki/Transformation_problem >>> >>> you have to understand the basic mathematical concepts of: >>> >>> a) combined rates, and >>> b) intratextual systems of linear equations >>> >>> Based on my teaching experience §b is easier to understand. Sorry I >>> couldn't find an "easier" explanation on youtube of that type of SLEs >>> than the one I used with my students preparing for the Regents: >>> >>> https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf >>> >>> the intratextuality of those problems matter to corpora research >>> because different strata of "like terms" ("verbs", "adjectives", ...) >>> is what creates grammar. "Crazy me" thinks you could to some extent >>> describe the "likeness of terms" underlying grammar! >>> ~ >>> I also have a guideline about combined rates which I successfully >>> used with my students: >>> >>> https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf >>> ~ >>> What the eff do combined rates and SLEs have to do with Marx' >>> transformation problem? ;-) >>> >>> Well, notice that the -equitable aspect- used to solve combined rates >>> problems is the time (regardless of how differently fast one "works" >>> in comparison with others). There is also another type of combine rate >>> problems: you drive to some place with a friend who doesn't care about >>> driving fast, but you need to rest so she drives for a while ... that >>> problem is different from two people meeting at a place each driving >>> "on their own cars" (at their own average speed). >>> >>> Serge Heiden shared a paper about presidential debates which could be >>> also Mathematically studied as a CR kind of problem (even if >>> politicians as the crowd management clowns they all are don't have to >>> make sense, anyway), but as it happens with any dialogue there are >>> parts of the conversations in which both the cars and the time is >>> shared and other times when only (or more of) the time. I don't know >>> of a general Mathematical formulation to CRs kinds of problems, which >>> could be used for corpora research. On my "to do" list I have writing >>> papers studying Euclid's Elements and Plato's Dialogues in that way. >>> >>> Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in >>> English as "labor theory of value") somewhat metaphorically stated >>> that the exchange value of a commodity is a function of "society's >>> labour-time". He also rendered his ideas as equations (in more of a >>> verbally descriptive, metaphorical way), but that phrase: "society's >>> labour-time", was and is still found from questionable to >>> unfalsifiably wild. I don't claim to have mind reading powers, but I >>> think in his letter to his friend Ludwig Kugelmann, the thoroughgoing >>> Hegelian Marx was, he clearly explained what he meant (page: 222 in >>> file, 208 in book): >>> >>> >>> https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20Engels%2C%20Selected%20Correspondence.pdf >>> >>> Marx To Ludwig Kugelmann In Hanover London, July 11, 1868: >>> All that palaver about the necessity of proving the concept of value >>> comes from complete ignorance both of the subject dealt with and of >>> scientific method. Every child knows that a nation which ceased to >>> work, I will not say for a year, but even for a few weeks, would >>> perish. Every child knows, too, that the masses of products >>> corresponding to the different needs require different and >>> quantitatively determined masses of the total labour of society. That >>> this necessity of the distribution of social labour in definite >>> proportions cannot possibly be done away with by a particular form of >>> social production but can only change the mode of its appearance, is >>> self-evident. No natural laws can be done away with. What can change >>> in historically different circumstances is only the form in which >>> these laws assert themselves. And the form in which this proportional >>> distribution of labour asserts itself, in a state of society where the >>> interconnection of social labour is manifested in the private exchange >>> of the individual products of labour, is precisely the exchange value >>> of these products. >>> ~ >>> So, as I see it, in a Hegelian way, Marx was seeing the whole of >>> society as a corpus (in which we all live through our own >>> texts/narratives) talking about "socially necessary labour time" in >>> the way that "time" becomes the equitable aspect shared when >>> people/(-society as a whole-) work together as described by combined >>> rates kinds of problems. >>> >>> When "Alice buys some veggies from Bob, ..." she used money as >>> "equitable aspect" to get Bob's veggies (in the Marxian way they were >>> both part of a combine rates problem) and you tell me this is not >>> magical! >>> >>> > v. "some transactional electronic ("air"...) excitations": I don't get >>> this. >>> >>> you may pay with cash using coins or bills or using your debit card >>> which at the end of the day become transactional electronic >>> excitations on some hard drives. When you speak there is more to it >>> than vibrations/fluctuations of air. (I am referring to the medium >>> which Saussurean signifiers use) >>> >>> > vi. "your 'magic' and mine are different we are still able to >>> 'communicate'. How on earth do such things happen?": a disclaimer: I am not >>> using any magic in my attempts to communicate with you here. I try my best >>> to place myself in your shoes to guesstimate the points that you are trying >>> to get across. But many (as you can see above) didn't quite reach me. >>> >>> "I try my best to place myself in your shoes" ... ;-) Ha, ha, ha! >>> that is just a functional illusion. What do you know about "my shoes"? >>> I work as a gardener (which I love to do) so they are dirty and >>> smelly, ... I also love to eat garlic ... As I see things standing on >>> "my dirty and smelly shoes and voicing it from my garlicky mouth" >>> being honest and true to matters is good enough. >>> >>> lbrtchx >>> _______________________________________________ >>> Corpora mailing list -- [email protected] >>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>> To unsubscribe send an email to [email protected] >>> >> _______________________________________________ >> Corpora mailing list -- [email protected] >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to [email protected] >> >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
