:-)
HPPC.
We'll see who gets where first.
--benson
On Fri, Apr 2, 2010 at 10:06 AM, Dawid Weiss dawid.we...@gmail.com wrote:
What's the use case for needing to vary the hash function? It's one of
those things where I assume there are incorrect ways to do it, and
correct ways
I'm neutral... maybe let it marinate longer in Mahout, prove it's used
and worthwhile and such?
Yeah, I'd tend to agree here. Let's see if we get some contributions on it
and how it plays out for us.
Marination is exactly my motive why I work on HPPC in separation
from Mahout... Once you
What's the use case for needing to vary the hash function? It's one of
those things where I assume there are incorrect ways to do it, and
correct ways, and among the correct ways fairly clear arguments about
which function will be better -- i.e. the object should provide the
best function.
I am by no means an expert in English, but it seems the double l is
etymologically justified:
ORIGIN Latin, from collocare ‘place together’.
(Oxford dictionary).
Dawid
On Wed, Mar 17, 2010 at 4:28 PM, Drew Farris drew.far...@gmail.com wrote:
In the corpus-linguistics sense collocation is
Thanks Ted, will try it.
D.
On Wed, Feb 24, 2010 at 12:09 AM, Ted Dunning ted.dunn...@gmail.com wrote:
barely. This article might help:
http://unitstep.net/blog/2009/05/18/resolving-log4j-1215-dependency-problems-in-maven-using-exclusions/
On Tue, Feb 23, 2010 at 12:20 PM, Dawid Weiss
There are many folks knowledgeable about maven on this list, so I
thought I'd ask -- I'm trying to write a POM with scp deployment, but
maven consistently fails for me with authentication errors -- this is
most likely caused by an outdated (and buggy) jsch dependency (0.1.38
instead of 0.1.42).
manage to set up an ftp deployment, so this is not crucial for
me anymore, but I'd rather have scp than ftp.
Thanks,
Dawid
~/.m2/settings.xml file to incorporate your public key, etc. I think
Mahout has it configured. Check the How To Release page on the Wiki.
On Feb 23, 2010, at 3:16 AM, Dawid
1. We'd like to organize several subprojects we wish to introduce (Core,
NLP, Recommenders/Taste, Ports - C++, etc.) that wouldn't really fit as
Lucene subprojects.
And the collections package, vectors, verification and evaluation
code, potential test data sets... yes, makes sense to make
I wrote a post about this a while ago. You need to use the 1.6
compiler, but set the target to 1.5 -- this way you can keep @Override
annotations, but emit valid 1.5 code anyway. I don't know about Maven
(javac), but it definitely works in Eclipse (can be set manually via
project properties).
D.
9:00 AM, Dawid Weiss dawid.we...@gmail.com wrote:
I wrote a post about this a while ago. You need to use the 1.6
compiler, but set the target to 1.5 -- this way you can keep @Override
annotations, but emit valid 1.5 code anyway. I don't know about Maven
(javac), but it definitely works
was in the pom.xml file. But it worked. Until like yesterday. Color me
confused.
Perhaps you had these classes compiled from the previous runs (when
1.6 flag was on)? I don't see how it could work with javac.
Oh, the third option is to remove @Override; it's not that useful anyway.
D.
Your experience is the reverse of mine. In maven, no javac complaints.
In eclipse, plenty-o-complaints.
Ooops, had the wrong class as the test. Correct, just tried to compile this:
public class OverrideIsHell
{
public interface A
{
public void a();
}
public class B
It's only some @Overrides -- those on interfaces. My vote is to
eliminate them from the 1.5-compatible projects.
+1 from me.
D.
Kill it. Shuffling can be easily done externally should somebody need it.
Dawid
On Sun, Feb 7, 2010 at 7:43 AM, Ted Dunning ted.dunn...@gmail.com wrote:
+1
On Sat, Feb 6, 2010 at 5:12 PM, Jake Mannix jake.man...@gmail.com wrote:
Kill it if you don't see internal use.
+1
-jake
On Feb
Hi Benson,
Apologies for my latest inactivity on this -- urgent family matters in
the form of a 60cm little newborn...
Just to share some of my thoughts on the collections stuff. I will
still wait for a numbered release of Mahout with colt and collections
in -- we need this to proceed with
The implementation in Colt is correct, it is double addressing, the
value of the second hash is always relatively prime to the first one
(and must not be zero). The colt's implementation can be rewritten as:
const_increment = 1 + h % (m - 2);
if you do a loop
while (true)
{
slot = (slot +
) {
index += hashSize - jump;
} else {
index -= jump;
}
currentKey = keys[index];
}
return index;
}
On Wed, Jan 27, 2010 at 8:38 AM, Dawid Weiss dawid.we...@gmail.com wrote:
The implementation in Colt is correct, it is double addressing, the
value
Ooops, apologies, didn't analyze this condition properly, you're
right, it will go past REMOVED:
while (currentKey != null (currentKey == REMOVED ||
!key.equals(currentKey))) {
But then -- the same thing applies to put; if you don't find the key
in the map and there is a removed slot on the
Yep, that's a good point. It'll involve a little copy-and-paste to
implement this alternate way of looking for a slot efficiently, but
probably worth it.
Depends if you're doing interleaved put/gets, but it may be, yes. I
like the guard sentinel object for marking removed keys --
I think I'll
Hi Benson (and others),
Say, when you moved the code from Apache Harmony, did you modify it
along the way, or is it what's found in the Harmony's
source code? I'm asking because we're still hitting those array out of
bounds exceptions sometimes. They are tough to isolate, so I reverted
to
[
https://issues.apache.org/jira/browse/MAHOUT-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated MAHOUT-266:
---
Attachment: MAHOUT-266.patch
Patch that solves the issue.
Broken Sorting can result in AIOOB
Reporter: Dawid Weiss
Attachments: MAHOUT-266.patch
The sorting condition is checked too eagerly; probably a typo while porting
from Harmony (all other sorting routines have similar pattern except this one).
--
This message is automatically generated by JIRA.
-
You can reply to this email
This looks like a bug, it's a different pattern everywhere else. See my patch.
D.
On Mon, Jan 25, 2010 at 4:16 PM, Benson Margulies bimargul...@gmail.com wrote:
Well,
I deleted my Harmony tree, I'll refetch it tonight and check.
--benson
On Mon, Jan 25, 2010 at 8:26 AM, Dawid Weiss
It's consistent with standard Java library. I guess it does not matter
much, unless you have a really weird distribution of the input values.
D.
On Mon, Jan 25, 2010 at 4:13 PM, Benson Margulies bimargul...@gmail.com wrote:
Why do you think they decided that the best hash function for an int
for Some Reason.
On Mon, Jan 25, 2010 at 10:46 AM, Sean Owen sro...@gmail.com wrote:
Dumb question, what would be better?
On Jan 25, 2010 3:24 PM, Dawid Weiss dawid.we...@gmail.com wrote:
It's consistent with standard Java library. I guess it does not matter
much, unless you have a really weird
I strongly support this -- ironically, we in Carrot2 also need such a
release (versioned, with Maven artefact to refer to). HPPC works more
than fine for us, but portions of the code are bound to Colt and we
can't easily switch all of it to HPPC yet.
I'd apply that patch for sorting first though,
(Out of curiosity what does the distribution have to do with it --
what's a distribution for which something besides identity is better?)
Oh, I don't know... let's say you know that your keys are always even,
then you could have a hash that divides by two so that you avoid
collisions in the
[
https://issues.apache.org/jira/browse/MAHOUT-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated MAHOUT-266:
---
Attachment: AIOOBInSortingTest.java
Definitely a bug. Attaching a test case from Carrot2 that causes
On Wed, Jan 20, 2010 at 10:57 AM, Dawid Weiss dawid.we...@gmail.com wrote:
I must have compiled to 1.5-bytecode, but using 1.6 standard library.
There are calls to Arrays#copyOf and, as far as I can tell, it's the
only thing there that is 1.6-specific. Will file a patch for this.
Dawid
On Wed
[
https://issues.apache.org/jira/browse/MAHOUT-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803346#action_12803346
]
Dawid Weiss commented on MAHOUT-264:
Because these methods in java.util.Arrays have
I have integrated HPPC collections with our open source and commercial
stuff, replacing PCJ. All tests pass, which is a good sign in addition
to the tests already included in HPPC.
The code is temporarily released in Carrot2 SVN at:
Hi. Is it possible to compile mahout-math in 1.5-compatibility mode?
This would require adding compiler plugin rules to POM. Mahout-math
does not use any of the Java 1.6-specific API, I checked.
Dawid
I must have compiled to 1.5-bytecode, but using 1.6 standard library.
There are calls to Arrays#copyOf and, as far as I can tell, it's the
only thing there that is 1.6-specific. Will file a patch for this.
Dawid
On Wed, Jan 20, 2010 at 7:14 PM, Dawid Weiss dawid.we...@gmail.com wrote:
Gee, I
Issue Type: Wish
Components: Math
Reporter: Dawid Weiss
Assignee: Benson Margulies
Priority: Minor
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
: Utils
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
A proposal for template-driven collections library (lists, sets, maps, deques),
with specializations for Java primitive types to save memory and increase
performance. The templates are regular
[
https://issues.apache.org/jira/browse/MAHOUT-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated MAHOUT-253:
---
Attachment: hppc-1.0-dev.zip
Proposal for high performance primitive collections
I propose a branch. Diffs from the branch to the trunk can still be
posted on the JIRA, but I think that a branch would be worthwhile in
facilitating collaboration.
Do you mean -- for merging with the code I posted earlier?
By the way, I've intergrated Colt from Mahout with our code base.
Have you finished with Colt? I think this is still worth completing
before we proceed to HPPC. Just talked to Staszek, we will move HPPC
code to Carrot2 labs SVN repository (sourceforge) because we want to
get rid of PCJ as soon as possible and need something versioned and
sticky. I plan to make a
Congratulations, Benson!
D.
On Wed, Jan 13, 2010 at 9:28 PM, Grant Ingersoll gsing...@apache.org wrote:
The Lucene PMC is pleased to welcome the addition of Benson Marguiles as a
committer on Mahout. I hope you'll join me in offering Benson a warm welcome.
Benson, Lucene tradition is that
Let's do this, guys: I have finished the implementation of basic data
structures. I will try to merge this code with Carrot2, replacing PCJ;
this should give me an additional level of confidency that everything
is working fine. I plan to have this step done by Friday.
Then, I will make this code
://issues.carrot2.org/browse/CARROT-614
Dawid
On Thu, Jan 14, 2010 at 9:07 AM, Dawid Weiss dawid.we...@gmail.com wrote:
Let's do this, guys: I have finished the implementation of basic data
structures. I will try to merge this code with Carrot2, replacing PCJ;
this should give me an additional
that was not related to
LGPL). I think that Jake and Sean get the credit for the heavy
lifting.
On Thu, Jan 14, 2010 at 6:52 AM, Dawid Weiss dawid.we...@gmail.com wrote:
Oh, as a side note to Benson -- your effort on porting these COLT
collections is appreciated from more than one angle
Hi guys,
I see Benson working really hard on converting Colt primitive
collections to Mahout -- this is great effort, really, since no such
library currently exists with an Apache or BSD license.
I wanted to ask you if compatibility with Java Collections is
something you consider crucial for a
Nice, Jeff. Darn, I need to post a nicer picture of myself, the current one has
a penitentiary feel to it ;)
D.
[snip]
a web crawler. By doing this, a crawler, for instance, can use the
output of the classification to only follow certain links that lie on
informative content parts.
Is this interesting make sense for you guys?
Hi Samuel. This would be of great interest for the Nutch folks, I
Hi guys. Not much activity from me -- really ashamed of it, but swamped in other
duties. Anyway, downloaded mahout-0.1-project.tar.bz2 and (OpenSuSE 10.3):
tar -jxf *.bz2
gives a warning:
tar: A lone zero block at 14473
Running mvn:install (Maven 2.0.9) hangs for a long time on one of the
tar: A lone zero block at 14473
I assume you are on a Mac? I get that too, but it always seems to be fine.
Nope, it's OpenSuSE (Linux), 64-bit. I've seen these warnins with gzip and
bzip-compressed tar files occasionally, but they never meant anything that would
indicate data corruption.
Cheers Ted, good to have you. How many projects can one man handle? You're a
machine, mate! :)
Dawid
Grant Ingersoll wrote:
Hi Mahouters,
I'm pleased to announce that the Lucene PMC has elected Ted Dunning as a
committer for Mahout (what, you mean he wasn't already? :-) ) As is
Carrot2 is for clustering web search results -- it's not exactly the same thing.
D.
shunkai.fu wrote:
There is one project called Carrot2 focusing on this topic already.
-邮件原件-
发件人: Marko Novakovic [mailto:[EMAIL PROTECTED]
发送时间: 2008年3月27日 7:03
收件人: mahout-dev@lucene.apache.org
This is absolutely necessary, if not for just showing off with the project, then
certainly for verification of correctness of algorithms inside it.
I will certainly hop in to such a subtask to the extent of my current available
time resources (not much, sadly).
D.
Grant Ingersoll wrote:
Good points, Andrzej.
* it should be meaningful and acceptable for the target audience, or
abstract enough that it doesn't matter. I'm not sure how the IBM-type
suits would react to the beach-ball if it were to appear in the
documentation of their product ;)
Oh, the collar worker
I would still wait a bit until the code we have is actually put to use. Having
some real-time applications and demos is the best way to convince people the
project has a future.
D.
Grant Ingersoll wrote:
What I would do is ask on Hadoop if there is interest in making it a
subproject. In
I have no problem with your proposal, but have not tested it. If the unit
tests still run then go ahead and commit it. If this means we no longer need
They do run fine. No need to provide the JAR name (this has been included in the
patch).
D.
Jeff, did you have a chance to try it? Can we close this issue?
D.
Dawid Weiss wrote:
Hi Jeff,
Like I said -- it seems that this issue is actually quite trivial to
solve by changing to the context class loader. See attached patch at
MAHOUT-13. Please check if it works (I did some testing
[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577037#action_12577037
]
Dawid Weiss commented on MAHOUT-6:
--
A quickie:
1. Make many, many rounds through the same
[
https://issues.apache.org/jira/browse/MAHOUT-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576749#action_12576749
]
Dawid Weiss commented on MAHOUT-10:
---
Hey guys. What's our committing policy? Can I commit
[
https://issues.apache.org/jira/browse/MAHOUT-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss resolved MAHOUT-10.
---
Resolution: Fixed
Applied to trunk.
Replace fall-through exception handlers with propagated
[
https://issues.apache.org/jira/browse/MAHOUT-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss resolved MAHOUT-12.
---
Resolution: Fixed
Implemented in trunk.
Point formatting and parsing improved (StringBuilder
I concur that we ought to have additional Writable representations to
make intra-Hadoop transfers more streamlined. This is certainly *not*
too late to pursue. I would encourage you to propose a record for Point
(which is in trunk) and these could be added to Vector and Matrix later
(once we
.
plugin.system.issuetabpanels:all-tabpanel ]
Dawid Weiss reassigned MAHOUT-11:
-
Assignee: Dawid Weiss
Static fields used throughout clustering code (Canopy, K-Means).
Key: MAHOUT
What about encouraging your students to submit their work at Mahout? Just a
naive thought of mine.
Those students I'm in charge of have their area of interest defined already --
too late to change it. Good idea for the future, I have been thinking about it,
actually.
D.
Hi guys,
I just looked at the code and noticed you use Class-relative classloader:
Class cl = Class.forName(job.get(DISTANCE_MEASURE_KEY));
This is effectively an attempt to load a class using the caller's class class
loader (the class loader is loaded via
[
https://issues.apache.org/jira/browse/MAHOUT-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated MAHOUT-10:
--
Attachment: mah-10.patch
Patch replacing printStackTrace with rethrowing of a RuntimeException
Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
I file this as a bug, even though I'm not 100% sure it is one. In the currect
code the information is exchanged via static fields (for example, distance
measure and thresholds for Canopies are static field
I changed the main's to pass in the location of the jar, since the ANT
task puts the jar in basedir/dist. I made a comment about it on
Mahout-3. The Canopy driver should do the right thing? I also did
the same thing w/ the k-means.
I honestly don't think the JAR file must be
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
Priority: Trivial
Added test case to point class, improved parsing (no need to recompile the
pattern all over again) and concatenation of points
[
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss reassigned MAHOUT-11:
-
Assignee: Dawid Weiss
Static fields used throughout clustering code (Canopy, K-Means
[
https://issues.apache.org/jira/browse/MAHOUT-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss reassigned MAHOUT-10:
-
Assignee: Dawid Weiss
Replace fall-through exception handlers with propagated unchecked
[
https://issues.apache.org/jira/browse/MAHOUT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss reassigned MAHOUT-13:
-
Assignee: Dawid Weiss
Investigate Mahout jar loading
+1 for the basic version. My comment about fonts still applies (I'd prefer more
regular glyphs, not hand-drawn).
D.
Lukas Vlcek wrote:
Hi,
Here is my second proposal for the Mahout logo:
Basic version -
http://picasaweb.google.com/lukas.vlcek/Mahout/photo#517229904094802
Horizontal
+1 from me as well. Non-MR patches are a starting point for MR ideas anyway.
I've been terribly busy, so I couldn't contribute so far -- really sorry about
it. Good work on the clustering algorithms, Jeff! I'll reserve some time on
Wednesday to go through the code and see if I can add
[
https://issues.apache.org/jira/browse/MAHOUT-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated MAHOUT-5:
-
Attachment: kmeans.zip
This is an implementation of k-means in non-MR form. It isn't intended
72 matches
Mail list logo