Great, many thanks for the tips (works perfectly now) and the warning about
2.5 development in this area.

One array takes about 5 seconds to hash and a couple of hundred MB RAM.  (I
have now implemented it as a Hashtable of HashSet<Integer> because one
probeset might be on two arrays.)

Is there any possibility that Affy array designs could have their
reporters/features in the database?  I'm sorry I don't understand Affy files
enough to know why this can't be done in BASE as it stands.  I guess they just
don't fit into ArrayDesignBlocks?  This may have been discussed to death
elsewhere sorry.

tack igen!

cheers,
Bob.


Nicklas Nordborg writes:
 > > Thanks Nicklas for the help with boolean queries.  Now I'm confused by
 > > something else also affy-related.
 > > 
 > > I'm trying to write code to lookup which ArrayDesigns a Reporter is on.
 > > For non-Affy arrays this is easy (join via arrayDesignBlocks, features and
 > > reporter).
 > > 
 > > For Affy arrays it looks like we have to use the CDF file through the Affy 
 > > API
 > > - get all the reporter ids (e.g. ProbeSetNames) and put them in a hash for
 > > reverse lookup later.
 > > 
 > > Here's a code snippet to do some of that (see last email for the 
 > > definition of
 > > affyQuery).  A lot of the code is taken directly from
 > > net/sf/basedb/core/Affymetrix.java, which is used by the "Affymetrix CDF
 > > probeset importer" plugin for Affy array design (see the "verify reporters"
 > > link on any Affy ArrayDesign page).  When I run this plugin as the same 
 > > user
 > > who runs the code below, it works fine.  However I get a 
 > > NullPointerException
 > > with my code...
 > > 
 > 
 > The Affymetrix.loadCdfFile() method only parses the headers. You must 
 > call cdf.clear() and cdf.read() to parse the entire file.
 > 
 > I have found that the Fusion SDK is unfortunately not very informative 
 > when it comes to error messages and doesn't have much of error handling 
 > either.
 > 
 > >     Hashtable affyProbeLookup = new Hashtable();
 > > 
 > >     ItemResultList<ArrayDesign> affyList = affyQuery.list(dc);
 > >     for (ArrayDesign ad : affyList) {
 > >       int adId = ad.getId();
 > >       FusionCDFData cdf = 
 > > Affymetrix.loadCdfFile(Affymetrix.getCdfFile(ad));
 > >       if (cdf == null) continue;
 > >       int numProbesets = cdf.getHeader().getNumProbeSets();
 > >       int index = 0;
 > >       while (index < numProbesets) {
 > >         String probesetId = cdf.getProbeSetName(index);  // Line 96
 > >         affyProbeLookup.put(probesetId, adId);
 > >         index++;
 > >       }
 > >     }
 > > 
 > > Exception in thread "main" java.lang.NullPointerException
 > >         at affymetrix.gcos.cdf.CDFFileData.getProbeSetName(Unknown Source)
 > >         at affymetrix.fusion.cdf.FusionCDFData.getProbeSetName(Unknown 
 > > Source)
 > >         at base_api_test.run_test(base_api_test.java:96)
 > >         at base_api_test.main(base_api_test.java:216)
 > > 
 > > 
 > > note: the cdf object seems OK (numProbesets is set properly).
 > > 
 > > 
 > > If anyone can point me in the right direction, it would be appreciated.
 > > It's a bit difficult not having the Affy source and line numbers
 > > (is that available?)
 > 
 > You'll have to ask Affymetrix for that. It all depends on how the 
 > package was compiled.
 > 
 > I also have to mention that the 2.5 release will have a lot of changes 
 > in how Affymetrix data is handled.
 > 
 > First, the special case used for Affymetrix has been replaced with a 
 > more generic way to support file attachements to any raw data types. 
 > Most of the methods in the Affymetrix class have been deprecated and 
 > replaced with something else (there are hints in the javadoc).
 > 
 > Second, BASE 2.5 will store CEL, CDF and other large files in a 
 > compressed format. To avoid having to unpack and copy the CEL and CDF 
 > files each time we just want to read the first 10-20 header lines BASE 
 > 2.5 will ship with a modified version of the Fusion SDK. The 
 > modifications have made it possible to pass a java.io.InputStream to the 
 > Fusion SDK instead of filenames (java.io.File). This may make it behave 
 > a bit differently than it did before. We have only modified the parts 
 > that we are using in BASE. Other parts have been left as they are. If 
 > you are only doing similar things as we do in the Affymetrix.java class 
 > it should be safe. The modified Fusion SDK can be found on 
 > http://trac.thep.lu.se/trac/basehacks
 > 
 > /Nicklas
 > 
 > 
 > 
 > -------------------------------------------------------------------------
 > This SF.net email is sponsored by: Splunk Inc.
 > Still grepping through log files to find problems?  Stop.
 > Now Search log events and configuration files using AJAX and a browser.
 > Download your FREE copy of Splunk now >> http://get.splunk.com/
 > _______________________________________________
 > basedb-devel mailing list
 > basedb-devel@lists.sourceforge.net
 > https://lists.sourceforge.net/lists/listinfo/basedb-devel

-- 
Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
Division of Cell and Molecular Biology | Imperial College London |
Phone +442075941945 | Email [EMAIL PROTECTED]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
basedb-devel mailing list
basedb-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/basedb-devel

Reply via email to