Great, many thanks for the tips (works perfectly now) and the warning about 2.5 development in this area.
One array takes about 5 seconds to hash and a couple of hundred MB RAM. (I have now implemented it as a Hashtable of HashSet<Integer> because one probeset might be on two arrays.) Is there any possibility that Affy array designs could have their reporters/features in the database? I'm sorry I don't understand Affy files enough to know why this can't be done in BASE as it stands. I guess they just don't fit into ArrayDesignBlocks? This may have been discussed to death elsewhere sorry. tack igen! cheers, Bob. Nicklas Nordborg writes: > > Thanks Nicklas for the help with boolean queries. Now I'm confused by > > something else also affy-related. > > > > I'm trying to write code to lookup which ArrayDesigns a Reporter is on. > > For non-Affy arrays this is easy (join via arrayDesignBlocks, features and > > reporter). > > > > For Affy arrays it looks like we have to use the CDF file through the Affy > > API > > - get all the reporter ids (e.g. ProbeSetNames) and put them in a hash for > > reverse lookup later. > > > > Here's a code snippet to do some of that (see last email for the > > definition of > > affyQuery). A lot of the code is taken directly from > > net/sf/basedb/core/Affymetrix.java, which is used by the "Affymetrix CDF > > probeset importer" plugin for Affy array design (see the "verify reporters" > > link on any Affy ArrayDesign page). When I run this plugin as the same > > user > > who runs the code below, it works fine. However I get a > > NullPointerException > > with my code... > > > > The Affymetrix.loadCdfFile() method only parses the headers. You must > call cdf.clear() and cdf.read() to parse the entire file. > > I have found that the Fusion SDK is unfortunately not very informative > when it comes to error messages and doesn't have much of error handling > either. > > > Hashtable affyProbeLookup = new Hashtable(); > > > > ItemResultList<ArrayDesign> affyList = affyQuery.list(dc); > > for (ArrayDesign ad : affyList) { > > int adId = ad.getId(); > > FusionCDFData cdf = > > Affymetrix.loadCdfFile(Affymetrix.getCdfFile(ad)); > > if (cdf == null) continue; > > int numProbesets = cdf.getHeader().getNumProbeSets(); > > int index = 0; > > while (index < numProbesets) { > > String probesetId = cdf.getProbeSetName(index); // Line 96 > > affyProbeLookup.put(probesetId, adId); > > index++; > > } > > } > > > > Exception in thread "main" java.lang.NullPointerException > > at affymetrix.gcos.cdf.CDFFileData.getProbeSetName(Unknown Source) > > at affymetrix.fusion.cdf.FusionCDFData.getProbeSetName(Unknown > > Source) > > at base_api_test.run_test(base_api_test.java:96) > > at base_api_test.main(base_api_test.java:216) > > > > > > note: the cdf object seems OK (numProbesets is set properly). > > > > > > If anyone can point me in the right direction, it would be appreciated. > > It's a bit difficult not having the Affy source and line numbers > > (is that available?) > > You'll have to ask Affymetrix for that. It all depends on how the > package was compiled. > > I also have to mention that the 2.5 release will have a lot of changes > in how Affymetrix data is handled. > > First, the special case used for Affymetrix has been replaced with a > more generic way to support file attachements to any raw data types. > Most of the methods in the Affymetrix class have been deprecated and > replaced with something else (there are hints in the javadoc). > > Second, BASE 2.5 will store CEL, CDF and other large files in a > compressed format. To avoid having to unpack and copy the CEL and CDF > files each time we just want to read the first 10-20 header lines BASE > 2.5 will ship with a modified version of the Fusion SDK. The > modifications have made it possible to pass a java.io.InputStream to the > Fusion SDK instead of filenames (java.io.File). This may make it behave > a bit differently than it did before. We have only modified the parts > that we are using in BASE. Other parts have been left as they are. If > you are only doing similar things as we do in the Affymetrix.java class > it should be safe. The modified Fusion SDK can be found on > http://trac.thep.lu.se/trac/basehacks > > /Nicklas > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > basedb-devel mailing list > basedb-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/basedb-devel -- Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups | Division of Cell and Molecular Biology | Imperial College London | Phone +442075941945 | Email [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ basedb-devel mailing list basedb-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/basedb-devel