Tim Hochberg wrote: > Keith Goodman wrote: > >> I have a very long list that contains many repeated elements. The >> elements of the list can be either all numbers, or all strings, or all >> dates [datetime.date]. >> >> I want to convert the list into a matrix where each unique element of >> the list is assigned a consecutive integer starting from zero. >> >> > If what you want is that the first unique element get's zero, the second > one, I don't think the code below will work in general since the dict > does not preserve order. You might want to look at the results for the > character case to see what I mean. If you're looking for something else, > you'll need to elaborate a bit. Since list2index doesn't return > anything, it's not entirely clear what the answer consists of. Just idx? > Idx plus uL? > > >> I've done it by brute force below. Any tips for making it faster? (5x >> would make it useful; 10x would be a dream.) >> >> > Assuming I understand what you're trying to do, this might help: > > def list2index2(L): > idx = ones([len(L)]) > map = {} > for i, x in enumerate(L): > index = map.get(x) > if index is None: > map[x] = index = len(map) > idx[i] = index > return idx > > > It's almost 10x faster for numbers and about 40x faster for characters > and dates. However it produces different results from list2index in the > second two cases. That may or may not be a good thing depending on what > you're really trying to do. > Ugh! I fell victim to premature optimization disease. The following is both clearer and faster: Sigh.
def list2index3(L): idx = ones([len(L)]) map = {} for i, x in enumerate(L): if x not in map: map[x] = len(map) idx[i] = map[x] return idx > -tim > > >> >> >>>> list2index.test() >>>> >>>> >> Numbers: 5.84955787659 seconds >> Characters: 24.3192870617 seconds >> Dates: 39.288228035 seconds >> >> >> import datetime, time >> from numpy import nan, asmatrix, ones >> >> def list2index(L): >> >> # Find unique elements in list >> uL = dict.fromkeys(L).keys() >> >> # Convert list to matrix >> L = asmatrix(L).T >> >> # Initialize return matrix >> idx = nan * ones((L.size, 1)) >> >> # Assign numbers to unique L values >> for i, uLi in enumerate(uL): >> idx[L == uLi,:] = i >> >> def test(): >> >> L = 5000*range(255) >> t1 = time.time() >> idx = list2index(L) >> t2 = time.time() >> print 'Numbers:', t2-t1, 'seconds' >> >> L = 5000*[chr(z) for z in range(255)] >> t1 = time.time() >> idx = list2index(L) >> t2 = time.time() >> print 'Characters:', t2-t1, 'seconds' >> >> d = datetime.date >> step = datetime.timedelta >> L = 5000*[d(2006,1,1)+step(z) for z in range(255)] >> t1 = time.time() >> idx = list2index(L) >> t2 = time.time() >> print 'Dates:', t2-t1, 'seconds' >> >> ------------------------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/numpy-discussion >> >> >> >> > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion