On Thu, Oct 14, 2010 at 12:08:09AM +0800, Michael Leonhard wrote:
> Sorry for the extremely late reply to this discussion.  I studied Japanese
> for about 4 years and now I'm learning Mandarin Chinese.

No worries!  I try to post mostly things that are still worth replying
to more than a few months later.

It's too bad mailing list software isn't very good at connecting up
replies and original posts, though.

> I think that memorization is only part of the process of internalizing
> vocabulary.  Other parts of the process are learning how to use the words
> effectively.  Many words, especially verbs, can represent many different
> things.  For example, the Chinese verb '開' (kai1) can mean 'open' (a door),
> 'turn on' (a machine), or 'drive' (a car).  Some words may refer to a
> concept that is unknown to the learner or cannot be describe succinctly with
> a single word.  Additionally there are a host of rules that govern the
> appropriateness of each word in various cultural contexts.  All of this adds
> up to much more information that must be memorized.  It's surely an order of
> magnitude larger than the log of the number of words.

Hmm. That's plausible.  Polysemy is pretty universal, but it really only
hits you in the face when you're translating between quite different
languages, so it's easy to ignore it.

On the other hand, I wonder if polysemy is mostly limited to common
words: words like "just" or "about" are mind-blowingly polysemic, words
like "note" and "pattern" are somewhat polysemic, words like "wreck" and
"loft" are slightly polysemic, but words like "diehard" and "cabochon"
are barely polysemic at all.  ("Diehard", being a compound word, is
arguably more than just a single morpheme, but it has an idiomatic
meaning that can't be derived from the senses of its constituent words,
quite apart from the polysemy of "die".)

So it's clearly more than simply the number of words, but the difference
could easily be less than an order of magnitude. It might end up being a
factor of 1.5 or a factor of 3.

> A friend of mine who has studied Chinese for many years recommends using the
> SuperMemo program for memorizing vocabulary:
> http://en.wikipedia.org/wiki/SuperMemo
> It builds a model of your memory and presents content at the minimum
> frequency necessary to maintain retention.  Apparently it works well for
> vocabulary and other information that can be memorized.

Yeah, I was thinking about SuperMemo and Piotr Wozniak when I wrote the
post.  I haven't tried it, or anything else using Piotr's algorithm.

> I think that language acquisition is a very interesting application
> for software-assisted learning, and there are a lot of folks working
> on language education.  But I think that technical skills are more
> suited to software-assisted learning than human language.  For
> example, programming languages can be learned in a short amount of
> time and contain large "vocabularies" of standard library functions
> that are ideal for rote memorization.  When I was hired at Amazon, I
> spent several months mostly learning how to use their custom modules
> and tools.  This knowledge became stale over the next 3 years and I
> had to continually learn as systems were upgraded or replaced.  Plus I
> saw folks waste time reinventing the wheel because they refused to
> learn how to use existing modules.

That's a very good point.

In theory, all work we do in software should be either creative (the
first time someone implements XML-RPC, at the same time as defining it)
or trivial (invoking an existing XML-RPC library).  In practice, people
spend lots of time reinventing the wheel.  You've identified one reason
that people reinvent wheels: it's hard to find out what existing wheels
already exist. I talked about some others in [code reuse considered
harmful] [0].  

[0]: http://lists.canonical.org/pipermail/kragen-tol/1999-March/000359.html

Rereading this today, the only thing that reads as outdated is its
emphasis on programming in C.  Also, it only talks about cases where
code reuse appears appealing, and doesn't mention the cases where you
wouldn't even attempt it, such as the increasingly common case of having
libraries with incompatible dependencies: a JVM library like Lucene,
say, and a library that depends on some CPython module that doesn't
exist in Jython.

On the other hand, this point of view fails to consider as "work" the
time we spend on rote memorization. But you are surely right that a lack
of rote memorization is a major reason for code duplication today,
despite the massive amount of time we already spend on rote
memorization.

> I believe that Amazon and other tech companies could save money and
> increase efficiency by deploying software-assisted learning and
> retention tools.  And I think there is a business opportunity for a
> company to build and support software for this.

Certainly your first point is correct.  The second point depends on what
is and is not sellable, a subject on which my intuition is quite poor.
It's at least not obviously sellable: most tech companies don't
conceptualize their communal ignorance about their internally-developed
APIs as the serious handicap that it is, as evidenced by the state of
documentation about those APIs. You'd have to first persuade them that
they're missing big opportunities by failing to reuse code, and then
persuade them that having all their developers spend half an hour a day
on API memorization could solve that problem, and then persuade them
that your software was a more cost-effective way to do that.  All in
all, it sounds like a pretty difficult sales cycle to me.  But that
certainly doesn't mean it's impossible!

Kragen
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to